For aggregation, I am betting my money on Pig+HBase. Pig team has recently completed HBase 0.20.6 integration to be able to load data from and to HBase. This enable us to write PigLatin functions to slice and dice data. This new feature is available in Pig trunk (0.8), hence I am waiting for a pig release before incorporating aggregation tool chains in Chukwa. You are welcome to start experiment with pig+hbase.
See: https://issues.apache.org/jira/browse/PIG-1205 To polish HICC related issues: 1) The only thing that needs to be change for full integration are: remove jdbc stuff, and change *.descriptor files to use metrics rest api to graph. 2) Export function is currently a browser (client side) operation. We don't store png file on the server, and I don't plan to create static image repository in HICC. Hence, there is no static URL for getting the images. It is possible and trivial to add with server side graphing libraries. 3) Aggregate is going to be series of pig jobs, waiting for pig 0.8 to release. 4) I am not good at writing document. Helps are always welcome. After we have a set of pig scripts to replace the current outdated aggregation scripts. We should look at Mahout to see if it is useful for implementing AI algorithm to determine cluster failure. Sign of cluster crash happens many minutes before cluster crash. My team had a intern who which applied class 1 svm classification algorithm to predict hadoop failure. This was done at small scale single machine training. The same algorithm can be implemented using mahout to refine hadoop error prediction algorithms. regards, Eric On Thu, Nov 4, 2010 at 4:10 PM, Ariel Rabkin <[email protected]> wrote: > Hi all. > > Want to report back on some preliminary efforts here at Berkeley to > use HICC+HBase. > > So the good news is, it works. We're able to get data from adaptors, > to collectors, into HBase, and then draw graphs of it. > Many thanks to Eric for helping us debug. > > A lot of the pain involved disentangling us from HBase 0.89. That > now seems to be mostly done. > > Now the rough edges. > > 1) The graphs-from-HBase don't seem to be at all integrated with the > rest of HICC; it's a separate jsp. > 2) Lots of graphical rough edges. E.g., the export button leaves > incredible gunk in the address bar, without producing a usable URL. > 3) No aggregates. > 4) No documentation. > > Eric, what was the strategy you had in mind for aggregates, and > integrating with the rest of HICC? > This has now become a priority for the lab, so we have manpower to > throw at it, but want to make optimal use of it. > > --Ari > > -- > Ari Rabkin [email protected] > UC Berkeley Computer Science Department >
