1. Your first quess is right - file is 'broken' into blocks which are then
stored according to the replication policy and other things.
2. It doesn't happen automatically, as far as I know. One has to 're-balance'
the cluster in this case.
--
Take care,
Cos
On 11/16/09 13:47 , Massoud Mazar wrote:
This is probably a basic question:
Assuming replication is set to 3, when we store a large file in HDFS, is
the whole file stored in 3 nodes (even if you have many more nodes) or
it is broken into blocks and each block is written to 3 nodes? (I assume
it is the latter, so when you have 30 nodes available, each one gets a
piece of the file, providing more performance when reading the file).
My second question is what happens if we add more nodes to an existing
cluster? Would any existing blocks be moved to these new nodes to expand
the distribution of the data to new nodes?
Thanks
Massoud