Hi Oded, For Chukwa 0.3, it does not support external class file. For TRUNK, you can create your own parser to run in dmux. The parser class should extend org.apache.hadoop.chukwa.extraction.demux.processor.AbstractProcessor for mapper or implements org.apache.hadoop.chukwa.extraction.demux.processor.ReduceProcessor for reducer. Edit CHUKWA_CONF/chukwa-demux-conf.xml, and reference the RecordType to your class names.
After you have both class files and chukwa-demux-conf.xml file, put your jar file in hdfs://namenode:port/chukwa/demux and the next demux job will pick up the parser and run them automatically. Duplication detection should be handled by your mapper or reducer class, or a post demux step. Chukwa does not offer duplication detection currently. Hope this helps. Regards, Eric On 3/9/10 1:01 PM, "Oded Rosen" <o...@legolas-media.com> wrote: > Hi, > > I wonder if one can write an additional data process (in addition to the Demux > + Archiving processes). > The option of writing a plug-in demux class is available, but can I write > another processes of my own to run in parallel do the demux+archiving, on the > same data? > What does it take? > What classes should be inherited? > How do I configure it (eg tell chukwa to apply it on every piece of data)? > Do I have to deal with duplications myself? > > Thanks a lot,