The config 'fs.default.name' of core-site.xml is what makes this happen. Its default value is "file:///" which corresponds to local mode of Hadoop. In local mode Hadoop looks for paths on the local file system. In distributed mode of Hadoop, 'fs.default.name' would be "hdfs://IP_OF_NAMENODE/" and it will look for those paths in HDFS.
Thanks, Tejas On Thu, Jan 2, 2014 at 7:28 PM, Bin Wang <binwang...@gmail.com> wrote: > Hi there, > > When I went through the source code of Nutch - the ParseSegment class, > which is the class to "parse content in a segment". Here is its map reduce > job configuration part. > > http://svn.apache.org/viewvc/nutch/trunk/src/java/org/apache/nutch/parse/ParseSegment.java?view=markup > (Line > 199 - 213) > > 199 JobConf job = new NutchJob(getConf()); 200 job.setJobName("parse " + > segment); 201 202 FileInputFormat.addInputPath(job, new Path(segment, > Content.DIR_NAME)); 203 job.set(Nutch.SEGMENT_NAME_KEY, > segment.getName()); 204 job.setInputFormat(SequenceFileInputFormat.class); > 205 job.setMapperClass(ParseSegment.class); 206 > job.setReducerClass(ParseSegment.class); 207 208 > FileOutputFormat.setOutputPath(job, > segment); 209 job.setOutputFormat(ParseOutputFormat.class); 210 > job.setOutputKeyClass(Text.class); 211 > job.setOutputValueClass(ParseImpl.class); 212 213 JobClient.runJob(job); > Here, in line 202 and line 208, the map reduce input/output path has been > configured by calling methods addInputPath/setOutputPath from > FileInputFormat. > And it is the absolute path in the Linux OS instead of HDFS virtual path. > > And on the other hand, when I look at the WordCount example in the hadoop > homepage. > https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html (Line 39 - 55) > > 39. JobConf conf = new JobConf(WordCount.class); 40. > conf.setJobName("wordcount"); 41. 42. > conf.setOutputKeyClass(Text.class); 43. > conf.setOutputValueClass(IntWritable.class); 44. 45. > conf.setMapperClass(Map.class); 46. > conf.setCombinerClass(Reduce.class); 47. > conf.setReducerClass(Reduce.class); 48. 49. > conf.setInputFormat(TextInputFormat.class); 50. > conf.setOutputFormat(TextOutputFormat.class); 51. 52. > FileInputFormat.setInputPaths(conf, > new Path(args[0])); 53. FileOutputFormat.setOutputPath(conf, new > Path(args[1])); 54. 55. JobClient.runJob(conf); > Here, the input/output path was configured in the same way as Nutch but > the path was actually passed by passing the arguments. > bin/hadoop jar /usr/joe/wordcount.jar org.myorg.WordCount > /usr/joe/wordcount/input /usr/joe/wordcount/output > And we can see the paths passed to the program are actually HDFS path.. > not Linux OS path.. > I am confused here is there some other configuration that I missed which > lead to the run environment difference? In which case, should I pass > absolute or HDFS path? > > Thanks a lot! > > /usr/bin > >