Re: mapreduce linear chaining: ClassCastException

bejoy . hadoop Sat, 15 Oct 2011 12:09:04 -0700

Great!..

Sorry for the KeyValueInputFormat It is KeyValueInputTextFormat itself. I was 
replying from my handheld and was getting the class name from memory, so excuse 
me for that. :)


For your further requirements like descending order, playing around with 
Comparator is required I believe.

Thank you

Regards
Bejoy K S

-----Original Message-----
From: "Periya.Data" <[email protected]>
Date: Sat, 15 Oct 2011 10:59:00 
To: <[email protected]>; <[email protected]>
Subject: Re: mapreduce linear chaining: ClassCastException

Fantastic ! Thanks much Bejoy. Now, I am able to get the output of my MR-2
nicely. I had to convert the sum (in text) format to IntWritable and I am
able to get all the word frequency <Freq, Word> in ascending order. I used
"KeyValueTextInputFormat.class". My program was complaining when I used
"KeyValueInputFormat".

Now, let me investigate how to do that in descending order...and then
top-20...etc. I know I must look into RawComparator and more...

Thanks,
PD.

On Sat, Oct 15, 2011 at 1:08 AM, <[email protected]> wrote:

> Hi
>    I believe what is happening in your case is that.
> The first map reduce jobs runs to completion
> When you trigger the second map reduce job, it is triggered with the
> default input format, TextInputFormat and definitely expects the key value
> as LongWritable and Text type. In default the MapReduce jobs output format
> is TextOutputFormat, key value as tab seperated. When you need to consume
> this output of an MR job  as key value pairs by another MR job, use
> KeyValueInputFormat, ie while setting config parameters for second job set
> jobConf.setInputFormat(KeyValueInput Format.class).
> Now if your output key value pairs use a different separator other than
> default tab then for second job you need to specify that as well using
> key.value.separator.in.input.line
>
> In short for your case in second map reduce job doing the following would
> get things in place
> -use jobConf.setInputFormat(KeyValueInputFormat.class)
> -alter your mapper to accept key values of type Text,Text
> -swap the key and values within mapper for output to reducer with
> conversions.
>
> To be noted here,AFAIK KeyValueInputFormat is not a part of new mapreduce
> API.
>
> Hope it helps.
>
> Regards
> Bejoy K S
>
> -----Original Message-----
> From: "Periya.Data" <[email protected]>
> Date: Fri, 14 Oct 2011 17:31:27
> To: <[email protected]>; <[email protected]>
> Reply-To: [email protected]
> Subject: mapreduce linear chaining: ClassCastException
>
> Hi all,
>   I am trying a simple extension of WordCount example in Hadoop. I want to
> get a frequency of wordcounts in descending order. To that I employ a
> linear
> chain of MR jobs. The first MR job (MR-1) does the regular wordcount (the
> usual example). For the next MR job => I set the mapper to swap the <word,
> count> to <count, word>. Then,  have the Identity reducer to simply store
> the results.
>
> My MR-1 does its job correctly and store the result in a temp path.
>
> Question 1: The mapper of the second MR job (MR-2) doesn't like the input
> format. I have properly set the input format for MapClass2 of what it
> expects and what its output must be. It seems to expecting a LongWritable.
> I
> suspect that it is trying to look at some index file. I am not sure.
>
>
> It throws an error like this:
>
> <code>
>    java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot
> be cast to org.apache.hadoop.io.Text
> </code>
>
> Some Info:
> - I use old API (org.apache.hadoop.mapred.*). I am asked to stick with it
> for now.
> - I use hadoop-0.20.2
>
> For MR-1:
> - conf1.setOutputKeyClass(Text.class);
> - conf1.setOutputValueClass(IntWritable.class);
>
> For MR-2
> - takes in a Text (word) and IntWritable (sum)
> - conf2.setOutputKeyClass(IntWritable.class);
> - conf2.setOutputValueClass(Text.class);
>
> <code>
> public class MapClass2 extends MapReduceBase
>      implements Mapper<Text, IntWritable, IntWritable, Text> {
>
>      @Override
>      public void map(Text word, IntWritable sum,
>              OutputCollector<IntWritable, Text> output,
>              Reporter reporter) throws IOException {
>
>      output.collect(sum, word);   // <sum, word>
>      }
>  }
> </code>
>
> Any suggestions would be helpful. Is my MapClass2 code right in the first
> place...for swapping? Or should I assume that mapper reads line by line,
> so,  must read in one line, then, use StrTokenizer to split them up and
> convert the second token (sum) from str to Int....?? Or should I mess
> around
> with OutputKeyComparator class?
>
> Thanks,
> PD
>
>

Re: mapreduce linear chaining: ClassCastException

Reply via email to