This is probably a basic question:

Assuming replication is set to 3, when we store a large file in HDFS, is the 
whole file stored in 3 nodes (even if you have many more nodes) or it is broken 
into blocks and each block is written to 3 nodes? (I assume it is the latter, 
so when you have 30 nodes available, each one gets a piece of the file, 
providing more performance when reading the file).

My second question is what happens if we add more nodes to an existing cluster? 
Would any existing blocks be moved to these new nodes to expand the 
distribution of the data to new nodes?

Thanks
Massoud

Reply via email to