Re: How is big file got divided

Teppo Kurki Thu, 20 Apr 2006 04:41:58 -0700

Lei Chen wrote:

It seems that big
file can be split within one line. But the map/reduce will still work
properly since the dfs layer will hide the block layout information from the
map/reduce tasks.

It's up to the InputFormat to handle records that are split on FileSplitboundaries.

TextInputFormat apparently reads a line past the end of the Splitboundary and starts reading from the first linebreak encountered. Seehttp://svn.apache.org/viewcvs.cgi/lucene/hadoop/trunk/src/java/org/apache/hadoop/mapred/TextInputFormat.java?view=markupfor details.


(I added this info to http://wiki.apache.org/lucene-hadoop/HadoopMapReduce).

Re: How is big file got divided

Reply via email to