Re: How to modify hadoop-wordcount example to display File-wise results.

aaron_v Thu, 29 Mar 2012 12:19:37 -0700

Hi people, Am new to Nabble and Hadoop. I was having a look at the wordcount
program. Can someone please let me know how to find which data gets mapped
to which node?In the sense, I have a master node 0 and 4 other nodes 1-4 
and I ran the wordcount successfully. But I would like to print for each
node how much data it got from the input data file. Any suggestions??


us latha wrote:
> 
> Hi,
> 
> Inside Map method, performed following change for  Example: WordCount
> v1.0<http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#Example%3A+WordCount+v1.0>at
> http://hadoop.apache.org/core/docs/current/mapred_tutorial.html
> ------------------
> String filename = new String();
> ...
>  filename =  ((FileSplit) reporter.getInputSplit()).getPath().toString();
>  while (tokenizer.hasMoreTokens()) {
>             word.set(tokenizer.nextToken()+" "+filename);
> --------------------
> 
> Worked great!! Thanks to everyone!
> 
> Regards,
> Srilatha
> 
> 
> On Sat, Oct 18, 2008 at 6:24 PM, Latha <usla...@gmail.com> wrote:
> 
>> Hi All,
>>
>> Thankyou for your valuable inputs in suggesting me the possible solutions
>> of creating an index file with following format.
>> word1 filename count
>> word2 filename count.
>>
>> However, following is not working for me. Please help me to resolve the
>> same.
>>
>> --------------------------
>>  public static class Map extends MapReduceBase implements
>> Mapper<LongWritable, Text, Text, Text> {
>>           private Text word = new Text();
>>           private Text filename = new Text();
>>           public void map(LongWritable key, Text value,
>> OutputCollector<Text, Text > output, Reporter reporter) throws
>> IOException {
>>           filename.set( ((FileSplit)
>> reporter.getInputSplit()).getPath().toString());
>>           String line = value.toString();
>>           StringTokenizer tokenizer = new StringTokenizer(line);
>>           while (tokenizer.hasMoreTokens()) {
>>                word.set(tokenizer.nextToken());
>>                output.collect(word, filename);
>>               }
>>           }
>>   }
>>
>>   public static class Reduce extends MapReduceBase implements
>> Reducer<Text,
>> Text , Text, Text> {
>>       public void reduce(Text key, Iterator<Text> values,
>> OutputCollector<Text, Text > output, Reporter reporter) throws
>> IOException {
>>          int sum = 0;
>>          Text filename;
>>          while (values.hasNext()) {
>>              sum ++;
>>              filename.set(values.next().toString());
>>          }
>>        String file = filename.toString() + " " + ( new
>> IntWritable(sum)).toString();
>>        filename=new Text(file);
>>        output.collect(key, filename);
>>        }
>>   }
>>
>> --------------------------
>> 08/10/18 05:38:25 INFO mapred.JobClient: Task Id :
>> task_200810170342_0010_m_000000_2, Status : FAILED
>> java.io.IOException: Type mismatch in value from map: expected
>> org.apache.hadoop.io.IntWritable, recieved org.apache.hadoop.io.Text
>>         at
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:427)
>>         at org.myorg.WordCount$Map.map(WordCount.java:23)
>>         at org.myorg.WordCount$Map.map(WordCount.java:13)
>>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
>>         at
>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)
>>
>>
>> Thanks
>> Srilatha
>>
>>
>>
>> On Mon, Oct 6, 2008 at 11:38 AM, Owen O'Malley <omal...@apache.org>
>> wrote:
>>
>>> On Sun, Oct 5, 2008 at 12:46 PM, Ted Dunning <ted.dunn...@gmail.com>
>>> wrote:
>>>
>>> > What you need to do is snag access to the filename in the configure
>>> method
>>> > of the mapper.
>>>
>>>
>>> You can also do it in the map method with:
>>>
>>> ((FileSplit) reporter.getInputSplit()).getPath()
>>>
>>>
>>> Then instead of outputting just the word as the key, output a pair
>>> > containing the word and the file name as the key.  Everything
>>> downstream
>>> > should remain the same.
>>>
>>>
>>> If you want to have each file handled by a single reduce, I'd suggest:
>>>
>>> class FileWordPair implements Writable {
>>>  private Text fileName;
>>>  private Text word;
>>>  ...
>>>  public int hashCode() {
>>>     return fileName.hashCode();
>>>  }
>>> }
>>>
>>> so that the HashPartitioner will send the records for file Foo to a
>>> single
>>> reducer. It would make sense to use this as an example for when to use
>>> grouping comparators (for getting a single call to reduce for each file)
>>> too...
>>>
>>> -- Owen
>>>
>>
>>
> 
> 

-- 
View this message in context: 
http://old.nabble.com/How-to-modify-hadoop-wordcount-example-to-display-File-wise-results.-tp19826857p33544888.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Re: How to modify hadoop-wordcount example to display File-wise results.

Reply via email to