On Wed, Jul 7, 2010 at 9:18 PM, Arun Ramakrishnan <[email protected]> wrote: > Looks like there is not much activity in the hdfs-user list. So, am reposting > it in the general list. > > Hi guys. > I have a few related questions. I am going to layout the steps I have taken. > Please comment on what I can do better. > > I was trying to to add 5 nodes to my existing 10 node cluster and also > increase the replication factor from 2 to 3. > I thought I don't have to run the balancer cause it would most likely put the > new replicas into the new nodes. > > There are about 500k blocks. > I wanted to get it all stabilized(replication and balancing) within 24 hours. > Its more than 24 hours now and fsck reports 30% under replication. Is there a > way to force hdfs to use balance/replicate more aggressively. > > It would be great if someone explained what/when things happen to blocks in > the context of > > 1) Rebalancing > > 2) -setrep > > 3) Restarting cluster with a higher/lower replication factor. > > A few questions and a few issues here. > > 1) When you restart the cluster with a higher than previous replication > value. Does it also apply to existing blocks or only to new blocks being > created ? > > 2) Does the balancer take into account under replication of blocks or > does it blindly start moving existing blocks to reach threshold ? > > > A very specific problem . I am having this strange problem where the -setrep > hangs on one particular block for hours. Is this because its corrupt ?. But, > fsck said its healthy. > > > Thanks > Arun > >
> 2) -setrep This will change the replication factor of an existing file (in the background it should start replicating) > 2) Does the balancer take into account under replication of blocks or does it > blindly start moving existing blocks to reach threshold ? Files most under replication should be prioritized. > 3) Restarting cluster with a higher/lower replication factor. This only affects new files that are created. Where the client has not specified a value > A very specific problem . I am having this strange problem where the -setrep > hangs on one particular block for hours. Is this because its corrupt ?. But, > fsck said its healthy. Not sure > Its more than 24 hours now and fsck reports 30% under There is a configuration setting for maximum replication bandwidth. You might have to tune that.
