Hi people, Am new to Nabble and Hadoop. I was having a look at the wordcount program. Can someone please let me know how to find which data gets mapped to which node?In the sense, I have a master node 0 and 4 other nodes 1-4 and I ran the wordcount successfully. But I would like to print for each node how much data it got from the input data file. Any suggestions??
us latha wrote: > > Hi, > > Inside Map method, performed following change for Example: WordCount > v1.0<http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#Example%3A+WordCount+v1.0>at > http://hadoop.apache.org/core/docs/current/mapred_tutorial.html > ------------------ > String filename = new String(); > ... > filename = ((FileSplit) reporter.getInputSplit()).getPath().toString(); > while (tokenizer.hasMoreTokens()) { > word.set(tokenizer.nextToken()+" "+filename); > -------------------- > > Worked great!! Thanks to everyone! > > Regards, > Srilatha > > > On Sat, Oct 18, 2008 at 6:24 PM, Latha <usla...@gmail.com> wrote: > >> Hi All, >> >> Thankyou for your valuable inputs in suggesting me the possible solutions >> of creating an index file with following format. >> word1 filename count >> word2 filename count. >> >> However, following is not working for me. Please help me to resolve the >> same. >> >> -------------------------- >> public static class Map extends MapReduceBase implements >> Mapper<LongWritable, Text, Text, Text> { >> private Text word = new Text(); >> private Text filename = new Text(); >> public void map(LongWritable key, Text value, >> OutputCollector<Text, Text > output, Reporter reporter) throws >> IOException { >> filename.set( ((FileSplit) >> reporter.getInputSplit()).getPath().toString()); >> String line = value.toString(); >> StringTokenizer tokenizer = new StringTokenizer(line); >> while (tokenizer.hasMoreTokens()) { >> word.set(tokenizer.nextToken()); >> output.collect(word, filename); >> } >> } >> } >> >> public static class Reduce extends MapReduceBase implements >> Reducer<Text, >> Text , Text, Text> { >> public void reduce(Text key, Iterator<Text> values, >> OutputCollector<Text, Text > output, Reporter reporter) throws >> IOException { >> int sum = 0; >> Text filename; >> while (values.hasNext()) { >> sum ++; >> filename.set(values.next().toString()); >> } >> String file = filename.toString() + " " + ( new >> IntWritable(sum)).toString(); >> filename=new Text(file); >> output.collect(key, filename); >> } >> } >> >> -------------------------- >> 08/10/18 05:38:25 INFO mapred.JobClient: Task Id : >> task_200810170342_0010_m_000000_2, Status : FAILED >> java.io.IOException: Type mismatch in value from map: expected >> org.apache.hadoop.io.IntWritable, recieved org.apache.hadoop.io.Text >> at >> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:427) >> at org.myorg.WordCount$Map.map(WordCount.java:23) >> at org.myorg.WordCount$Map.map(WordCount.java:13) >> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219) >> at >> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122) >> >> >> Thanks >> Srilatha >> >> >> >> On Mon, Oct 6, 2008 at 11:38 AM, Owen O'Malley <omal...@apache.org> >> wrote: >> >>> On Sun, Oct 5, 2008 at 12:46 PM, Ted Dunning <ted.dunn...@gmail.com> >>> wrote: >>> >>> > What you need to do is snag access to the filename in the configure >>> method >>> > of the mapper. >>> >>> >>> You can also do it in the map method with: >>> >>> ((FileSplit) reporter.getInputSplit()).getPath() >>> >>> >>> Then instead of outputting just the word as the key, output a pair >>> > containing the word and the file name as the key. Everything >>> downstream >>> > should remain the same. >>> >>> >>> If you want to have each file handled by a single reduce, I'd suggest: >>> >>> class FileWordPair implements Writable { >>> private Text fileName; >>> private Text word; >>> ... >>> public int hashCode() { >>> return fileName.hashCode(); >>> } >>> } >>> >>> so that the HashPartitioner will send the records for file Foo to a >>> single >>> reducer. It would make sense to use this as an example for when to use >>> grouping comparators (for getting a single call to reduce for each file) >>> too... >>> >>> -- Owen >>> >> >> > > -- View this message in context: http://old.nabble.com/How-to-modify-hadoop-wordcount-example-to-display-File-wise-results.-tp19826857p33544888.html Sent from the Hadoop core-user mailing list archive at Nabble.com.