This is probably a basic question: Assuming replication is set to 3, when we store a large file in HDFS, is the whole file stored in 3 nodes (even if you have many more nodes) or it is broken into blocks and each block is written to 3 nodes? (I assume it is the latter, so when you have 30 nodes available, each one gets a piece of the file, providing more performance when reading the file).
My second question is what happens if we add more nodes to an existing cluster? Would any existing blocks be moved to these new nodes to expand the distribution of the data to new nodes? Thanks Massoud