Automated anomoly report and anonymous usage statistics collection
------------------------------------------------------------------

                 Key: HBASE-2333
                 URL: https://issues.apache.org/jira/browse/HBASE-2333
             Project: Hadoop HBase
          Issue Type: New Feature
            Reporter: Andrew Purtell
            Priority: Minor


Collection of anonymous usage data from users willing to participate can help 
the project in several ways:

- Characterization of typical workloads

- Long term trending of various performance metrics across releases

This could be done by having the master collect information from the region 
servers and itself over a 24 hour period then send a report to a configured 
URL, some web service up on *.hbase.org. The information would be anonymized 
according to detail put up on the wiki. Each master would identify itself via a 
GUID built in part from MAC address. For the above items, only aggregated 
statistics are interesting, number of ops/hour/server, where ops are such 
things as get, put, scanner.next, split, compact, etc. At the same time, sample 
HDFS metrics and system metrics (cpu, ram, wio) over the same interval. 

Later some more involved reporting activities can be considered:

- Trigger based autonomous switch to DEBUG mode and log collection for some 
period -- like automated crash reports -- given failure or stress indications.

For this type of activity, table names, keys, and server names would be 
replaced with hashes of them, sufficient for event correlation for debugging 
but using cryptographically strong one way functions to fully obfuscate details 
of the application. 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to