If, for example, you have a record that contains 20MB in one block and 1MB in 
another, Map/Reduce will feed you the entire 21MB record.  If you are lucky and 
the map is executing on a node with the 20MB block, MapReduce will transfer 1MB 
out of HDFS for you.

This is glossing over some details, but the point is that MR will feed you 
whole records regardless of whether they are stored on one or two blocks.

Brian

On Mar 4, 2011, at 2:24 PM, Kelly Burkhart wrote:

> On Fri, Mar 4, 2011 at 1:42 PM, Harsh J <[email protected]> wrote:
>> HDFS does not operate with records in mind.
> 
> So does that mean that HDFS will break a file at exactly <blocksize>
> bytes?  Map/Reduce *does* operate with records in mind, so what
> happens to the split record?  Does HDFS put the fragments back
> together and deliver the reconstructed record to one map?  Or are both
> fragments and consequently the whole record discarded?
> 
> Thanks,
> 
> -Kelly

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to