Re: Writing another data process

Eric Yang Tue, 09 Mar 2010 20:29:08 -0800

Hi Oded,

For Chukwa 0.3, it does not support external class file.  For TRUNK, you can
create your own parser to run in dmux.  The parser class should extend
org.apache.hadoop.chukwa.extraction.demux.processor.AbstractProcessor for
mapper or implements
org.apache.hadoop.chukwa.extraction.demux.processor.ReduceProcessor for
reducer.  Edit CHUKWA_CONF/chukwa-demux-conf.xml, and reference the
RecordType to your class names.

After you have both class files and chukwa-demux-conf.xml file, put your jar
file in hdfs://namenode:port/chukwa/demux and the next demux job will pick
up the parser and run them automatically.  Duplication detection should be
handled by your mapper or reducer class, or a post demux step.  Chukwa does
not offer duplication detection currently.  Hope this helps.

Regards,
Eric

On 3/9/10 1:01 PM, "Oded Rosen" <o...@legolas-media.com> wrote:

> Hi,
> 
> I wonder if one can write an additional data process (in addition to the Demux
> + Archiving processes).
> The option of writing a plug-in demux class is available, but can I write
> another processes of my own to run in parallel do the demux+archiving, on the
> same data?
> What does it take?
> What classes should be inherited?
> How do I configure it (eg tell chukwa to apply it on every piece of data)?
> Do I have to deal with duplications myself?
> 
> Thanks a lot,

Re: Writing another data process

Reply via email to