If you have a Jobconf called xxJob, you can set the key/value types for the map phase as follows: xxJob.setInputFormat(SequenceFileInputFormat.class); xxJob.setInputKeyClass(UTF8.class); xxJob.setInputValueClass(ArrayWritable.class);
Then set the key/value types for the reduce phase as follows: xxJob.setOutputFormat(); xxJob.setOutputKeyClass(LongWritable.class); xxJob.setoutputValueClass(UT8.class); Hairong -----Original Message----- From: Teppo Kurki [mailto:[EMAIL PROTECTED] Sent: Wednesday, March 29, 2006 7:04 AM To: [email protected] Subject: Different Key/Value classes for Map and Reduce? Trying Hadoop out with a proof of concept I came across the following problem. My input looks conceptually like this: textid1, number1, number2, number3 textid2, number2, number2, number3 textid1, number2, number5 ... I am interested in getting unique textid counts per number. Numbers are Longs. My Mapper parses the values from input lines and emits <LongWritable, UTF8> pairs like this: number1, textid1 number2, textid1 number3, textid1 number1, textid2 number2, textid2 number3, textid2 number2, textid1 number5, textid1 ... and my Reducer counts unique textids per number and emits <LongWritable, IntWritable> pairs. Is there a way to define different Key and Value classes separately for the Map and Reduce phases? The easy workaround is to emit the counts as strings, but surely somebody has come across this kind of usage before. I have a little more complicated analyses in mind that will call for more complex data structures to be handled separately.
