[Lucene-hadoop Wiki] Update of "FAQ" by Arun C Murthy

Apache Wiki Mon, 24 Sep 2007 11:32:48 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for 
change notification.


The following page has been changed by Arun C Murthy:
http://wiki.apache.org/lucene-hadoop/FAQ

The comment on the change is:
Added a section on how to write maps which process complete input files

------------------------------------------------------------------------------
  
  The entire discussion holds true for maps of jobs with reducer=NONE (i.e. 0 
reduces) since output of the map, in that case, goes directly to hdfs.
  
+ 
+ [[BR]]
+ [[Anchor(10)]]
+ '''10. [#10 How do I get each of my maps to work on one complete input-file 
and not allow the framework to split-up my files?]'''
+ 
+ Essentially a job's input is represented by the 
[http://lucene.apache.org/hadoop/api/org/apache/hadoop/mapred/InputFormat.html 
InputFormat](interface)/[http://lucene.apache.org/hadoop/api/org/apache/hadoop/mapred/FileInputFormat.html
 FileInputFormat](base class).
+ 
+ For this purpose one would need a 'non-splittable' 
[http://lucene.apache.org/hadoop/api/org/apache/hadoop/mapred/FileInputFormat.html
 FileInputFormat] i.e. an input-format which essentially tells the map-reduce 
framework that it cannot be split-up and processed. To do this you need your 
particular input-format to return '''false''' for the 
[http://lucene.apache.org/hadoop/api/org/apache/hadoop/mapred/FileInputFormat.html#isSplitable(org.apache.hadoop.fs.FileSystem,%20org.apache.hadoop.fs.Path)
 isSplittable] call.
+  
+ E.g. 
'''org.apache.hadoop.mapred.Sort``Validator.Record``Stats``Checker.Non``Splitable``Sequence``File``Input``Format'''
 in 
[http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/test/org/apache/hadoop/mapred/SortValidator.java
 src/test/org/apache/hadoop/mapred/SortValidator.java]
+  
+

[Lucene-hadoop Wiki] Update of "FAQ" by Arun C Murthy

Reply via email to