Dear Wiki user, You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.
The following page has been changed by Arun C Murthy: http://wiki.apache.org/lucene-hadoop/FAQ The comment on the change is: Added a section on how to write maps which process complete input files ------------------------------------------------------------------------------ The entire discussion holds true for maps of jobs with reducer=NONE (i.e. 0 reduces) since output of the map, in that case, goes directly to hdfs. + + [[BR]] + [[Anchor(10)]] + '''10. [#10 How do I get each of my maps to work on one complete input-file and not allow the framework to split-up my files?]''' + + Essentially a job's input is represented by the [http://lucene.apache.org/hadoop/api/org/apache/hadoop/mapred/InputFormat.html InputFormat](interface)/[http://lucene.apache.org/hadoop/api/org/apache/hadoop/mapred/FileInputFormat.html FileInputFormat](base class). + + For this purpose one would need a 'non-splittable' [http://lucene.apache.org/hadoop/api/org/apache/hadoop/mapred/FileInputFormat.html FileInputFormat] i.e. an input-format which essentially tells the map-reduce framework that it cannot be split-up and processed. To do this you need your particular input-format to return '''false''' for the [http://lucene.apache.org/hadoop/api/org/apache/hadoop/mapred/FileInputFormat.html#isSplitable(org.apache.hadoop.fs.FileSystem,%20org.apache.hadoop.fs.Path) isSplittable] call. + + E.g. '''org.apache.hadoop.mapred.Sort``Validator.Record``Stats``Checker.Non``Splitable``Sequence``File``Input``Format''' in [http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/test/org/apache/hadoop/mapred/SortValidator.java src/test/org/apache/hadoop/mapred/SortValidator.java] + +