Boyu Zhang wrote:
Dear All,
I have a question in my mind about HDFS and I cannot find the answer from
the documents on the apache website. I have a cluster of 4 machines, one is
the namenode and the other 3 are datanodes. When I put 6 files, each 430 MB,
to HDFS, the 6 files are split into 42 blocks(64MB each). But what polices
are used to assign these blocks to datanode? In my case, machine1 got 14
blocks, machine2 got 12 blocks and machine3 got 16 blocks.
Could anyone one help me with it? Or is there any documentation I can read
to help me clarify this?
1. dont panic if nobody replies to your message in an hour and resend.
Hadoop developers/users are in many different timezones, and people
often only look at this at odd times in the day. Its best to wait 24
hours before worrying if your email got lost
2. The namenode decides, usually two blocks to one rack, another block
to a different rack. This is to save on datacentre backbone bandwidth,
but isolate you from the loss of an entire rack (not so unusual once
your rack is on shared DC power/PSUs).
3. There has been discussion on having plug-in policy here, but it would
need to work with the load balancer, the code that balances blocks
across machines in the background.