Thanks Eric. I was mainly interested in the per-job/task info, similar to thtat provided by UserDailySummary.pig because per-job/task info seems to be missing from the HBase info. Any pointers on how to get a table of jobs into HBase are welcome :-) -preetam
On Wed, Jul 6, 2011 at 9:12 PM, Eric Yang <[email protected]> wrote: > Hi Preetam, > > ClusterSummary.pig is the only one that works with HBase. Other pig > scripts are designed to work on sequence files for Chukwa 0.4. The > scripts are thrown together at crunch time. There is no > documentation. The ChukwaLoader/ChukwaStore function needs to be > revised to use HBase to bring the scripts up to date with Chukwa 0.5. > Hadoop_*.pig scripts are for down sampling of data from raw resolution > into specified time resolution, i.e. 30 minutes average, or 180 > minutes average. UserDailySummary.pig is design to aggregate data > from JobHistory log files to generate a user usage report. However, > this was designed to work on JobHistory file for Hadoop 0.18. I don't > think it works with Hadoop 0.20+ because JobHistory format changed in > Hadoop 0.20. > > regards, > Eric > > On Wed, Jul 6, 2011 at 5:04 AM, Preetam Patil <[email protected]> > wrote: > > Hi, > > I notice that there are a bunch of Pig scripts in scripts/pig directory, > > only ClusterSummary.pig seems > > to be mentioned in the documentation. The other scripts also seem to be > > based on a storage model > > other than HBase, but provide more info (e.g., per-job/task stats) than > that > > stored in HBase. > > Are these compatible with 0.5, and if not, what needs to be done to get > > them working and > > where can I find any API info for them? > > Thanks, > > -preetam > > >
