On 12/22/09 1:36 PM, "Bill Graham" <billgra...@gmail.com> wrote:

> I've written my own Processor to handle my log format per this wiki and I've
> run into a couple of gotchast:
> http://wiki.apache.org/hadoop/DemuxModification
> 
> 1. The default processor is not the TsProcessor as documented, but the
> DefaultProcessor (see line 83 of Demux.java). This causes headaches because
> when using DefaultProcessor  data always goes under minute "0" in hdfs,
> regardless of when in the hour it was created.
> 

There is a generic method to build the record, like:

buildGenericRecord(record, recordEntry, timestamp, recordType);

This method will build up key like:

Time partition/Primary Key/timestamp

When all records are roll up into large sequence file by end of the hour and
end of the day, the sequence file is sorted by time partition and primary
key.  This arrangement of data structure was put in place to assist data
scanning.  When data is retrieved, use record.getTimestamp() to find the
real timestamp for the record.

TsProcessor is incompleted for now because the key in ChukwaRecord is used
in hourly and daily roll up.  Without using buildGenericRecord, hourly and
daily roll up will not work correctly.

> 2. When implementing a custom parser as shown in the wiki, how do you register
> the class so it gets included in the job that's submitted to the hadoop
> cluster? The only way I've been able to do this is to put my class in the
> package org.apache.hadoop.chukwa.extraction.demux.processor.mapper and then
> manually add that class to the chukwa-core-0.3.0.jar that  is on my data
> processor, which is a pretty rough hack. Otherwise, I get class not found
> exceptions in my mapper.

The demux process is controlled by $CHUKWA_HOME/conf/chukwa-demux-conf.xml,
and map the recordType to your parser class.  There is an plan to load
parser class from class path by using Java annotation.  It is still in the
initial phase of planning.  Design participation are welcome.  Hope this
helps.  :)

Regards,
Eric

Reply via email to