If, for example, you have a record that contains 20MB in one block and 1MB in another, Map/Reduce will feed you the entire 21MB record. If you are lucky and the map is executing on a node with the 20MB block, MapReduce will transfer 1MB out of HDFS for you.
This is glossing over some details, but the point is that MR will feed you whole records regardless of whether they are stored on one or two blocks. Brian On Mar 4, 2011, at 2:24 PM, Kelly Burkhart wrote: > On Fri, Mar 4, 2011 at 1:42 PM, Harsh J <[email protected]> wrote: >> HDFS does not operate with records in mind. > > So does that mean that HDFS will break a file at exactly <blocksize> > bytes? Map/Reduce *does* operate with records in mind, so what > happens to the split record? Does HDFS put the fragments back > together and deliver the reconstructed record to one map? Or are both > fragments and consequently the whole record discarded? > > Thanks, > > -Kelly
smime.p7s
Description: S/MIME cryptographic signature
