Re: How to modify hadoop-wordcount example to display File-wise results.

Raj Vishwanathan Thu, 29 Mar 2012 19:56:37 -0700

Aaron 

You can get the details of how much data each mapper processed, on which node ( 
IP address actually!) from the job logs.


Raj




>________________________________
> From: Ajay Srivastava <ajay.srivast...@guavus.com>
>To: "<common-user@hadoop.apache.org>" <common-user@hadoop.apache.org> 
>Cc: "<core-u...@hadoop.apache.org>" <core-u...@hadoop.apache.org> 
>Sent: Thursday, March 29, 2012 5:57 PM
>Subject: Re: How to modify hadoop-wordcount example to display File-wise 
>results.
> 
>Hi Aaron,
>I guess that it can be done by using counters.
>You can define a counter for each node in your cluster and then, in map method 
>increment a node specific counter either by checking hostname or ip address.
>It's not a very good solution as you will need to modify your code whenever a 
>node is added/removed from cluster and there will be as many if conditions in 
>code as number of nodes. You can try this out if you do not find a cleaner 
>solution. I wish that this counter should have been part of predefined 
>counters. 
>
>
>Regards,
>Ajay Srivastava
>
>
>On 30-Mar-2012, at 12:49 AM, aaron_v wrote:
>
>> 
>> Hi people, Am new to Nabble and Hadoop. I was having a look at the wordcount
>> program. Can someone please let me know how to find which data gets mapped
>> to which node?In the sense, I have a master node 0 and 4 other nodes 1-4 
>> and I ran the wordcount successfully. But I would like to print for each
>> node how much data it got from the input data file. Any suggestions??
>> 
>> us latha wrote:
>>> 
>>> Hi,
>>> 
>>> Inside Map method, performed following change for  Example: WordCount
>>> v1.0<http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#Example%3A+WordCount+v1.0>at
>>> http://hadoop.apache.org/core/docs/current/mapred_tutorial.html
>>> ------------------
>>> String filename = new String();
>>> ...
>>> filename =  ((FileSplit) reporter.getInputSplit()).getPath().toString();
>>> while (tokenizer.hasMoreTokens()) {
>>>            word.set(tokenizer.nextToken()+" "+filename);
>>> --------------------
>>> 
>>> Worked great!! Thanks to everyone!
>>> 
>>> Regards,
>>> Srilatha
>>> 
>>> 
>>> On Sat, Oct 18, 2008 at 6:24 PM, Latha <usla...@gmail.com> wrote:
>>> 
>>>> Hi All,
>>>> 
>>>> Thankyou for your valuable inputs in suggesting me the possible solutions
>>>> of creating an index file with following format.
>>>> word1 filename count
>>>> word2 filename count.
>>>> 
>>>> However, following is not working for me. Please help me to resolve the
>>>> same.
>>>> 
>>>> --------------------------
>>>> public static class Map extends MapReduceBase implements
>>>> Mapper<LongWritable, Text, Text, Text> {
>>>>          private Text word = new Text();
>>>>          private Text filename = new Text();
>>>>          public void map(LongWritable key, Text value,
>>>> OutputCollector<Text, Text > output, Reporter reporter) throws
>>>> IOException {
>>>>          filename.set( ((FileSplit)
>>>> reporter.getInputSplit()).getPath().toString());
>>>>          String line = value.toString();
>>>>          StringTokenizer tokenizer = new StringTokenizer(line);
>>>>          while (tokenizer.hasMoreTokens()) {
>>>>               word.set(tokenizer.nextToken());
>>>>               output.collect(word, filename);
>>>>              }
>>>>          }
>>>>  }
>>>> 
>>>>  public static class Reduce extends MapReduceBase implements
>>>> Reducer<Text,
>>>> Text , Text, Text> {
>>>>      public void reduce(Text key, Iterator<Text> values,
>>>> OutputCollector<Text, Text > output, Reporter reporter) throws
>>>> IOException {
>>>>         int sum = 0;
>>>>         Text filename;
>>>>         while (values.hasNext()) {
>>>>             sum ++;
>>>>             filename.set(values.next().toString());
>>>>         }
>>>>       String file = filename.toString() + " " + ( new
>>>> IntWritable(sum)).toString();
>>>>       filename=new Text(file);
>>>>       output.collect(key, filename);
>>>>       }
>>>>  }
>>>> 
>>>> --------------------------
>>>> 08/10/18 05:38:25 INFO mapred.JobClient: Task Id :
>>>> task_200810170342_0010_m_000000_2, Status : FAILED
>>>> java.io.IOException: Type mismatch in value from map: expected
>>>> org.apache.hadoop.io.IntWritable, recieved org.apache.hadoop.io.Text
>>>>        at
>>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:427)
>>>>        at org.myorg.WordCount$Map.map(WordCount.java:23)
>>>>        at org.myorg.WordCount$Map.map(WordCount.java:13)
>>>>        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
>>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
>>>>        at
>>>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)
>>>> 
>>>> 
>>>> Thanks
>>>> Srilatha
>>>> 
>>>> 
>>>> 
>>>> On Mon, Oct 6, 2008 at 11:38 AM, Owen O'Malley <omal...@apache.org>
>>>> wrote:
>>>> 
>>>>> On Sun, Oct 5, 2008 at 12:46 PM, Ted Dunning <ted.dunn...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> What you need to do is snag access to the filename in the configure
>>>>> method
>>>>>> of the mapper.
>>>>> 
>>>>> 
>>>>> You can also do it in the map method with:
>>>>> 
>>>>> ((FileSplit) reporter.getInputSplit()).getPath()
>>>>> 
>>>>> 
>>>>> Then instead of outputting just the word as the key, output a pair
>>>>>> containing the word and the file name as the key.  Everything
>>>>> downstream
>>>>>> should remain the same.
>>>>> 
>>>>> 
>>>>> If you want to have each file handled by a single reduce, I'd suggest:
>>>>> 
>>>>> class FileWordPair implements Writable {
>>>>> private Text fileName;
>>>>> private Text word;
>>>>> ...
>>>>> public int hashCode() {
>>>>>    return fileName.hashCode();
>>>>> }
>>>>> }
>>>>> 
>>>>> so that the HashPartitioner will send the records for file Foo to a
>>>>> single
>>>>> reducer. It would make sense to use this as an example for when to use
>>>>> grouping comparators (for getting a single call to reduce for each file)
>>>>> too...
>>>>> 
>>>>> -- Owen
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> -- 
>> View this message in context: 
>> http://old.nabble.com/How-to-modify-hadoop-wordcount-example-to-display-File-wise-results.-tp19826857p33544888.html
>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>> 
>
>
>
>

Re: How to modify hadoop-wordcount example to display File-wise results.

Reply via email to