Thanks Denny! So that means each map task will have to read from another DataNode inorder to read the end line of the previous block?
Cheers, Donal 2011/11/11 Denny Ye <denny...@gmail.com> > hi > Structured data is always being split into different blocks, likes a > word or line. > MapReduce task read HDFS data with the unit - *line* - it will read > the whole line from the end of previous block to start of subsequent to > obtains that part of line record. So you does not worry about the > Incomplete structured data. HDFS do nothing for this mechanism. > > -Regards > Denny Ye > > > On Fri, Nov 11, 2011 at 3:43 PM, 臧冬松 <donal0...@gmail.com> wrote: > >> Usually large file in HDFS is split into bulks and store in different >> DataNodes. >> A map task is assigned to deal with that bulk, I wonder what if the >> Structured data(i.e a word) was split into two bulks? >> How MapReduce and HDFS deal with this? >> >> Thanks! >> Donal >> > >