Hi, I have a strange error with a chukwa parser that I wrote. The reducer class is implementing org.apache.hadoop.chukwa.extraction.demux.processor.ReduceProcessor. I'm setting (on the map class): key.setReduceType(RawReducer.class.getName());
and on the reducer I have: @Override public String getDataType() { return this.getClass().getName(); } The demux conf redirects my data type to my mapper class. I've plugged these classes in a jar at hdfs:/../chukwa/demux folder, but the reducer will not execute. I get a map/reduce job with a map input of a few million bytes, but map output bytes always equals 0. No output is written to the repos or any other dir on hdfs/../chukwa. I guess the output is empty because the demux cannot find my reducer. I've tried to put these classes in the chukwa-core jar, with the same results. I've already successfully written a only-mapper solution, but I need the reducer this time. What am I doing wrong? Thanks in advance, On Wed, Mar 10, 2010 at 6:28 AM, Eric Yang <ey...@yahoo-inc.com> wrote: > Hi Oded, > > For Chukwa 0.3, it does not support external class file. For TRUNK, you > can > create your own parser to run in dmux. The parser class should extend > org.apache.hadoop.chukwa.extraction.demux.processor.AbstractProcessor for > mapper or implements > org.apache.hadoop.chukwa.extraction.demux.processor.ReduceProcessor for > reducer. Edit CHUKWA_CONF/chukwa-demux-conf.xml, and reference the > RecordType to your class names. > > After you have both class files and chukwa-demux-conf.xml file, put your > jar > file in hdfs://namenode:port/chukwa/demux and the next demux job will pick > up the parser and run them automatically. Duplication detection should be > handled by your mapper or reducer class, or a post demux step. Chukwa does > not offer duplication detection currently. Hope this helps. > > Regards, > Eric > > > > On 3/9/10 1:01 PM, "Oded Rosen" <o...@legolas-media.com> wrote: > > > Hi, > > > > I wonder if one can write an additional data process (in addition to the > Demux > > + Archiving processes). > > The option of writing a plug-in demux class is available, but can I write > > another processes of my own to run in parallel do the demux+archiving, on > the > > same data? > > What does it take? > > What classes should be inherited? > > How do I configure it (eg tell chukwa to apply it on every piece of > data)? > > Do I have to deal with duplications myself? > > > > Thanks a lot, > > -- Oded