Mag Gam wrote:
I just setup my first hadoop cluster with 5 nodes. What is the best
way to check if replication is really working? I assume the best way
is to power down 2 nodes and see if I can still reach my data?
Or are there any others ways?
TIA
Well if you run
hadoop fsck /
it will give you a report of the number of replicas, if a file has less
than your configured replicas it will tell you that you are under
replicated.
As for powering off 2 nodes, well that will likely result in a few
missing blocks unless you have more than 2 replicas of each block. Now
if you turn off one data node after a while Hadoop will replicate the
blocks to the remaining 4 nodes, when you turn the 5th back on you will
have over replicated blocks.
Terrence