It sounds like there is some exciting work being done on the demux process. I was just wondering if you are planning to be backwards compatible with 0.3 format for /repos as you move forward .
Cheers james On 2010-02-26, at 10:38 AM, Eric Yang wrote: > > > > On 2/26/10 4:43 AM, "Guillermo Pérez" <bi...@tuenti.com> wrote: > >> One related thing is that I want to modify the "cluster" where we put >> the files, because we will receive syslog data with several types of >> events that we want to store in different clusters to analyze, backup, >> archive separately. I have seen that you can modify the >> Record.tagsField and that we use a regexp for extracting the >> destination cluster. This is a bit akward, isn't? I don't want to keep >> a tagsField just for that. I'm using a field "event_type" and I have >> modified the extraction/engine/RecordUtil.java, so if that field >> exists, "event_" + <event_type> will be used as cluster. This is the >> proper way to go, or there is a better solution for this?. > > I don't think you need to modify RecordUtil.java for this purpose. The > backfill java program is taking first parameter as cluster. Hence, you > could easily change event_type as the first parameter before you backfill. > >> Another question is where I could start looking on how to build >> reports and aggregated results of the custom ChukwaRecords I'm >> inserting. > > There is currently no formal solution to generate report from ChukwaRecords. > There is org.apache.hadoop.chukwa.dataloader.MetricDataLoader which loads > ChukwaRecords into mysql database base on mdl.xml file. After data is > loaded, you could use hicc.sh to start the webserver, and visualize the data > in Chukwa SQL Client widget. However, I must warn you that MetricDataLoader > is deprecated, and the future plan to generate report from ChukwaRecords is > as follow: > > Having a post demux data loader which wait to receive new ChukwaRecords > files, and merge with the existing ChukwaRecords files through a second MR > job. The second MR job also produces low resolution of the data for report. > > /chukwa/repos/TYPE/DATE <-- Original data goes here. > /chukwa/report/TYPE/[yearly,monthly,weekly,daily] <-- Summarized JSON data > goes here. > > The report JSON will be fixed to 300 data points per series, optimized for > graphing. I am taking it slow on the actual implementation because > ChukwaRecords should be move to a faster seralization format. It's another > area that needs to be improved for the future plan to work. > > Regards, > Eric > James Seigel ja...@tynt.com http://www.tynt.com Captain Hammer