Hi,
Our current cluster runs with 22 data nodes - each with 4TB .
We should be installing new data nodes on this existing cluster , but each will
have 8TB of storage capacity.
I am wondering how will the namenode distribute the blocks, It is my
understanding that Replica Placement policy is that data nodes are chosen at
random, so an even distribution
is expected , So eventually the smaller nodes
will fill up while the larger nodes will reach 50% at which point the small
nodes will become unusable.
Am I correct?
Is there any recommended practice in this case? would running a balancer
periodically help?