Hi Oded,

For Chukwa 0.3, it does not support external class file.  For TRUNK, you can
create your own parser to run in dmux.  The parser class should extend
org.apache.hadoop.chukwa.extraction.demux.processor.AbstractProcessor for
mapper or implements
org.apache.hadoop.chukwa.extraction.demux.processor.ReduceProcessor for
reducer.  Edit CHUKWA_CONF/chukwa-demux-conf.xml, and reference the
RecordType to your class names.

After you have both class files and chukwa-demux-conf.xml file, put your jar
file in hdfs://namenode:port/chukwa/demux and the next demux job will pick
up the parser and run them automatically.  Duplication detection should be
handled by your mapper or reducer class, or a post demux step.  Chukwa does
not offer duplication detection currently.  Hope this helps.

Regards,
Eric



On 3/9/10 1:01 PM, "Oded Rosen" <o...@legolas-media.com> wrote:

> Hi,
> 
> I wonder if one can write an additional data process (in addition to the Demux
> + Archiving processes).
> The option of writing a plug-in demux class is available, but can I write
> another processes of my own to run in parallel do the demux+archiving, on the
> same data?
> What does it take?
> What classes should be inherited?
> How do I configure it (eg tell chukwa to apply it on every piece of data)?
> Do I have to deal with duplications myself?
> 
> Thanks a lot,

Reply via email to