Re: Writing another data process

Oded Rosen Sat, 20 Mar 2010 14:37:23 -0700

Hi,
I have a strange error with a chukwa parser that I wrote.

The reducer class is implementing
org.apache.hadoop.chukwa.extraction.demux.processor.ReduceProcessor.
I'm setting (on the map class):
key.setReduceType(RawReducer.class.getName());

and on the reducer I have:
@Override
    public String getDataType() {
        return this.getClass().getName();
    }

The demux conf redirects my data type to my mapper class.

I've plugged these classes in a jar at hdfs:/../chukwa/demux folder, but the
reducer will not execute.
I get a map/reduce job with a map input of a few million bytes, but map
output bytes always equals 0.
No output is written to the repos or any other dir on hdfs/../chukwa.
I guess the output is empty because the demux cannot find my reducer.
I've tried to put these classes in the chukwa-core jar, with the same
results.

I've already successfully written a only-mapper solution, but I need the
reducer this time.
What am I doing wrong?

Thanks in advance,

On Wed, Mar 10, 2010 at 6:28 AM, Eric Yang <ey...@yahoo-inc.com> wrote:

> Hi Oded,
>
> For Chukwa 0.3, it does not support external class file.  For TRUNK, you
> can
> create your own parser to run in dmux.  The parser class should extend
> org.apache.hadoop.chukwa.extraction.demux.processor.AbstractProcessor for
> mapper or implements
> org.apache.hadoop.chukwa.extraction.demux.processor.ReduceProcessor for
> reducer.  Edit CHUKWA_CONF/chukwa-demux-conf.xml, and reference the
> RecordType to your class names.
>
> After you have both class files and chukwa-demux-conf.xml file, put your
> jar
> file in hdfs://namenode:port/chukwa/demux and the next demux job will pick
> up the parser and run them automatically.  Duplication detection should be
> handled by your mapper or reducer class, or a post demux step.  Chukwa does
> not offer duplication detection currently.  Hope this helps.
>
> Regards,
> Eric
>
>
>
> On 3/9/10 1:01 PM, "Oded Rosen" <o...@legolas-media.com> wrote:
>
> > Hi,
> >
> > I wonder if one can write an additional data process (in addition to the
> Demux
> > + Archiving processes).
> > The option of writing a plug-in demux class is available, but can I write
> > another processes of my own to run in parallel do the demux+archiving, on
> the
> > same data?
> > What does it take?
> > What classes should be inherited?
> > How do I configure it (eg tell chukwa to apply it on every piece of
> data)?
> > Do I have to deal with duplications myself?
> >
> > Thanks a lot,
>
>

-- 
Oded

Re: Writing another data process

Reply via email to