hi Structured data is always being split into different blocks, likes a word or line. MapReduce task read HDFS data with the unit - *line* - it will read the whole line from the end of previous block to start of subsequent to obtains that part of line record. So you does not worry about the Incomplete structured data. HDFS do nothing for this mechanism.
-Regards Denny Ye On Fri, Nov 11, 2011 at 3:43 PM, 臧冬松 <donal0...@gmail.com> wrote: > Usually large file in HDFS is split into bulks and store in different > DataNodes. > A map task is assigned to deal with that bulk, I wonder what if the > Structured data(i.e a word) was split into two bulks? > How MapReduce and HDFS deal with this? > > Thanks! > Donal >