Great!.. Sorry for the KeyValueInputFormat It is KeyValueInputTextFormat itself. I was replying from my handheld and was getting the class name from memory, so excuse me for that. :)
For your further requirements like descending order, playing around with Comparator is required I believe. Thank you Regards Bejoy K S -----Original Message----- From: "Periya.Data" <[email protected]> Date: Sat, 15 Oct 2011 10:59:00 To: <[email protected]>; <[email protected]> Subject: Re: mapreduce linear chaining: ClassCastException Fantastic ! Thanks much Bejoy. Now, I am able to get the output of my MR-2 nicely. I had to convert the sum (in text) format to IntWritable and I am able to get all the word frequency <Freq, Word> in ascending order. I used "KeyValueTextInputFormat.class". My program was complaining when I used "KeyValueInputFormat". Now, let me investigate how to do that in descending order...and then top-20...etc. I know I must look into RawComparator and more... Thanks, PD. On Sat, Oct 15, 2011 at 1:08 AM, <[email protected]> wrote: > Hi > I believe what is happening in your case is that. > The first map reduce jobs runs to completion > When you trigger the second map reduce job, it is triggered with the > default input format, TextInputFormat and definitely expects the key value > as LongWritable and Text type. In default the MapReduce jobs output format > is TextOutputFormat, key value as tab seperated. When you need to consume > this output of an MR job as key value pairs by another MR job, use > KeyValueInputFormat, ie while setting config parameters for second job set > jobConf.setInputFormat(KeyValueInput Format.class). > Now if your output key value pairs use a different separator other than > default tab then for second job you need to specify that as well using > key.value.separator.in.input.line > > In short for your case in second map reduce job doing the following would > get things in place > -use jobConf.setInputFormat(KeyValueInputFormat.class) > -alter your mapper to accept key values of type Text,Text > -swap the key and values within mapper for output to reducer with > conversions. > > To be noted here,AFAIK KeyValueInputFormat is not a part of new mapreduce > API. > > Hope it helps. > > Regards > Bejoy K S > > -----Original Message----- > From: "Periya.Data" <[email protected]> > Date: Fri, 14 Oct 2011 17:31:27 > To: <[email protected]>; <[email protected]> > Reply-To: [email protected] > Subject: mapreduce linear chaining: ClassCastException > > Hi all, > I am trying a simple extension of WordCount example in Hadoop. I want to > get a frequency of wordcounts in descending order. To that I employ a > linear > chain of MR jobs. The first MR job (MR-1) does the regular wordcount (the > usual example). For the next MR job => I set the mapper to swap the <word, > count> to <count, word>. Then, have the Identity reducer to simply store > the results. > > My MR-1 does its job correctly and store the result in a temp path. > > Question 1: The mapper of the second MR job (MR-2) doesn't like the input > format. I have properly set the input format for MapClass2 of what it > expects and what its output must be. It seems to expecting a LongWritable. > I > suspect that it is trying to look at some index file. I am not sure. > > > It throws an error like this: > > <code> > java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot > be cast to org.apache.hadoop.io.Text > </code> > > Some Info: > - I use old API (org.apache.hadoop.mapred.*). I am asked to stick with it > for now. > - I use hadoop-0.20.2 > > For MR-1: > - conf1.setOutputKeyClass(Text.class); > - conf1.setOutputValueClass(IntWritable.class); > > For MR-2 > - takes in a Text (word) and IntWritable (sum) > - conf2.setOutputKeyClass(IntWritable.class); > - conf2.setOutputValueClass(Text.class); > > <code> > public class MapClass2 extends MapReduceBase > implements Mapper<Text, IntWritable, IntWritable, Text> { > > @Override > public void map(Text word, IntWritable sum, > OutputCollector<IntWritable, Text> output, > Reporter reporter) throws IOException { > > output.collect(sum, word); // <sum, word> > } > } > </code> > > Any suggestions would be helpful. Is my MapClass2 code right in the first > place...for swapping? Or should I assume that mapper reads line by line, > so, must read in one line, then, use StrTokenizer to split them up and > convert the second token (sum) from str to Int....?? Or should I mess > around > with OutputKeyComparator class? > > Thanks, > PD > >
