Sherya,

The metadata is all stored in the name node.  It stores where all of the block 
are located and the order of the blocks in a file.   Data is merged as needed 
behind when you call methods on the instance of the java.io.InputStream 
returned when calling open.  So, when you open a file for reading you are 
making a connection to one of the machines that has a copy of the first block 
of the file.  As you read the data and you finish with the first block the 
second block is then fetched for you from what ever machine has a copy of it 
and you continue until all blocks are read.  Typically in map/reduce each 
mapper, that is reading data will read one block, and possibly a little bit 
more from the start of the next block.  That way you never have all of the file 
in memory on any machine.  Typically they only process a small part of the 
block at a time, one key/value pair.  However there is nothing stopping you 
from doing something bad, and trying to cache the entire contents of the file 
in memory as you read it from the stream, except that you would eventually get 
an out of memory exception.

--
Bobby Evans

On 4/19/11 4:19 AM, "Shreya Chakravarty" <shreya_chakrava...@persistent.co.in> 
wrote:

Hi,

I have a query regarding how Hadoop merges the data back which has been split 
into blocks and stored on different nodes.
*        Where is the data merged as we say that the file can be so huge that 
it doesn't fit onto one machine

*        Where is the sequence maintained for merging it back.



Thanks and Regards,
Shreya Chakravarty

DISCLAIMER ========== This e-mail may contain privileged and confidential 
information which is the property of Persistent Systems Ltd. It is intended 
only for the use of the individual or entity to which it is addressed. If you 
are not the intended recipient, you are not authorized to read, retain, copy, 
print, distribute or use this message. If you have received this communication 
in error, please notify the sender and delete all copies of this message. 
Persistent Systems Ltd. does not accept any liability for virus infected mails.

Reply via email to