Aaron You can get the details of how much data each mapper processed, on which node ( IP address actually!) from the job logs.
Raj >________________________________ > From: Ajay Srivastava <ajay.srivast...@guavus.com> >To: "<common-user@hadoop.apache.org>" <common-user@hadoop.apache.org> >Cc: "<core-u...@hadoop.apache.org>" <core-u...@hadoop.apache.org> >Sent: Thursday, March 29, 2012 5:57 PM >Subject: Re: How to modify hadoop-wordcount example to display File-wise >results. > >Hi Aaron, >I guess that it can be done by using counters. >You can define a counter for each node in your cluster and then, in map method >increment a node specific counter either by checking hostname or ip address. >It's not a very good solution as you will need to modify your code whenever a >node is added/removed from cluster and there will be as many if conditions in >code as number of nodes. You can try this out if you do not find a cleaner >solution. I wish that this counter should have been part of predefined >counters. > > >Regards, >Ajay Srivastava > > >On 30-Mar-2012, at 12:49 AM, aaron_v wrote: > >> >> Hi people, Am new to Nabble and Hadoop. I was having a look at the wordcount >> program. Can someone please let me know how to find which data gets mapped >> to which node?In the sense, I have a master node 0 and 4 other nodes 1-4 >> and I ran the wordcount successfully. But I would like to print for each >> node how much data it got from the input data file. Any suggestions?? >> >> us latha wrote: >>> >>> Hi, >>> >>> Inside Map method, performed following change for Example: WordCount >>> v1.0<http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#Example%3A+WordCount+v1.0>at >>> http://hadoop.apache.org/core/docs/current/mapred_tutorial.html >>> ------------------ >>> String filename = new String(); >>> ... >>> filename = ((FileSplit) reporter.getInputSplit()).getPath().toString(); >>> while (tokenizer.hasMoreTokens()) { >>> word.set(tokenizer.nextToken()+" "+filename); >>> -------------------- >>> >>> Worked great!! Thanks to everyone! >>> >>> Regards, >>> Srilatha >>> >>> >>> On Sat, Oct 18, 2008 at 6:24 PM, Latha <usla...@gmail.com> wrote: >>> >>>> Hi All, >>>> >>>> Thankyou for your valuable inputs in suggesting me the possible solutions >>>> of creating an index file with following format. >>>> word1 filename count >>>> word2 filename count. >>>> >>>> However, following is not working for me. Please help me to resolve the >>>> same. >>>> >>>> -------------------------- >>>> public static class Map extends MapReduceBase implements >>>> Mapper<LongWritable, Text, Text, Text> { >>>> private Text word = new Text(); >>>> private Text filename = new Text(); >>>> public void map(LongWritable key, Text value, >>>> OutputCollector<Text, Text > output, Reporter reporter) throws >>>> IOException { >>>> filename.set( ((FileSplit) >>>> reporter.getInputSplit()).getPath().toString()); >>>> String line = value.toString(); >>>> StringTokenizer tokenizer = new StringTokenizer(line); >>>> while (tokenizer.hasMoreTokens()) { >>>> word.set(tokenizer.nextToken()); >>>> output.collect(word, filename); >>>> } >>>> } >>>> } >>>> >>>> public static class Reduce extends MapReduceBase implements >>>> Reducer<Text, >>>> Text , Text, Text> { >>>> public void reduce(Text key, Iterator<Text> values, >>>> OutputCollector<Text, Text > output, Reporter reporter) throws >>>> IOException { >>>> int sum = 0; >>>> Text filename; >>>> while (values.hasNext()) { >>>> sum ++; >>>> filename.set(values.next().toString()); >>>> } >>>> String file = filename.toString() + " " + ( new >>>> IntWritable(sum)).toString(); >>>> filename=new Text(file); >>>> output.collect(key, filename); >>>> } >>>> } >>>> >>>> -------------------------- >>>> 08/10/18 05:38:25 INFO mapred.JobClient: Task Id : >>>> task_200810170342_0010_m_000000_2, Status : FAILED >>>> java.io.IOException: Type mismatch in value from map: expected >>>> org.apache.hadoop.io.IntWritable, recieved org.apache.hadoop.io.Text >>>> at >>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:427) >>>> at org.myorg.WordCount$Map.map(WordCount.java:23) >>>> at org.myorg.WordCount$Map.map(WordCount.java:13) >>>> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) >>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219) >>>> at >>>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122) >>>> >>>> >>>> Thanks >>>> Srilatha >>>> >>>> >>>> >>>> On Mon, Oct 6, 2008 at 11:38 AM, Owen O'Malley <omal...@apache.org> >>>> wrote: >>>> >>>>> On Sun, Oct 5, 2008 at 12:46 PM, Ted Dunning <ted.dunn...@gmail.com> >>>>> wrote: >>>>> >>>>>> What you need to do is snag access to the filename in the configure >>>>> method >>>>>> of the mapper. >>>>> >>>>> >>>>> You can also do it in the map method with: >>>>> >>>>> ((FileSplit) reporter.getInputSplit()).getPath() >>>>> >>>>> >>>>> Then instead of outputting just the word as the key, output a pair >>>>>> containing the word and the file name as the key. Everything >>>>> downstream >>>>>> should remain the same. >>>>> >>>>> >>>>> If you want to have each file handled by a single reduce, I'd suggest: >>>>> >>>>> class FileWordPair implements Writable { >>>>> private Text fileName; >>>>> private Text word; >>>>> ... >>>>> public int hashCode() { >>>>> return fileName.hashCode(); >>>>> } >>>>> } >>>>> >>>>> so that the HashPartitioner will send the records for file Foo to a >>>>> single >>>>> reducer. It would make sense to use this as an example for when to use >>>>> grouping comparators (for getting a single call to reduce for each file) >>>>> too... >>>>> >>>>> -- Owen >>>>> >>>> >>>> >>> >>> >> >> -- >> View this message in context: >> http://old.nabble.com/How-to-modify-hadoop-wordcount-example-to-display-File-wise-results.-tp19826857p33544888.html >> Sent from the Hadoop core-user mailing list archive at Nabble.com. >> > > > >