RE: Different Key/Value classes for Map and Reduce?

Hairong Kuang Wed, 29 Mar 2006 10:16:06 -0800

If you have a Jobconf called xxJob, you can set the key/value types for the
map phase as follows:
xxJob.setInputFormat(SequenceFileInputFormat.class);
xxJob.setInputKeyClass(UTF8.class);
xxJob.setInputValueClass(ArrayWritable.class);


Then set the key/value types for the reduce phase as follows:
xxJob.setOutputFormat();
xxJob.setOutputKeyClass(LongWritable.class);
xxJob.setoutputValueClass(UT8.class);

Hairong

-----Original Message-----
From: Teppo Kurki [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, March 29, 2006 7:04 AM
To: [email protected]
Subject: Different Key/Value classes for Map and Reduce?

Trying Hadoop out with a proof of concept I came across the following
problem.

My input looks conceptually like this:

textid1, number1, number2, number3
textid2, number2, number2, number3
textid1, number2, number5
...

I am interested in getting unique textid counts per number. Numbers are
Longs.

My Mapper parses the values from input lines and emits <LongWritable, 
UTF8> pairs like this:
number1, textid1
number2, textid1
number3, textid1
number1, textid2
number2, textid2
number3, textid2
number2, textid1
number5, textid1
...

and my Reducer counts unique textids per number and emits <LongWritable, 
IntWritable> pairs.

Is there a way to define different Key and Value classes separately for the
Map and Reduce phases? The easy workaround is to emit the counts as strings,
but surely somebody has come across this kind of usage before. 
I have a little more complicated analyses in mind that will call for more
complex data structures to be handled separately.

RE: Different Key/Value classes for Map and Reduce?

Reply via email to