Just a thought...

Really quick and dirty thing to do is to turn off the node. 
Within 10 minutes the node looks down to the JT and NN so it gets marked as 
down.
Run an fsck and it will show the files as under replicated and then will do the 
replication at the faster speed to rebalance the cluster.
(100MB/sec should be ok on a 1GBe link)

Then you can drop the next node... much faster than trying to decomission the 
node.

Its not the best way to do it, but it works.


> From: [email protected]
> Date: Fri, 12 Aug 2011 22:38:08 +0530
> Subject: Re: Speed up node under replicated block during decomission
> To: [email protected]
> 
> It could be that your process has hung cause a particular resident
> block (file) requires a very large replication factor, and your
> remaining # of nodes is less than that value. This is a genuine reason
> for hang (but must be fixed). The process usually waits until there
> are no under-replicated blocks, so I'd use fsck to check if any such
> ones are present and setrep them to a lower value.
> 
> On Fri, Aug 12, 2011 at 9:28 PM,  <[email protected]> wrote:
> > Hi All,
> >
> > I'm trying to decommission data node from my cluster.  I put the data node 
> > in the /usr/lib/hadoop/conf/dfs.hosts.exclude list and restarted the name 
> > nodes.  The under-replicated blocks are starting to replicate, but it's 
> > going down in a very slow pace.  For 1 TB of data it takes over 1 day to 
> > complete.   We change the settings as below and try to increase the 
> > replication rate.
> >
> > Added this to hdfs-site.xml on all the nodes on the cluster and restarted 
> > the data nodes and name node processes.
> > <property>
> >  <!-- 100Mbit/s -->
> >  <name>dfs.balance.bandwidthPerSec</name>
> >  <value>131072000</value>
> > </property>
> >
> > Speed didn't seem to pick up. Do you know what may be happening?
> >
> > Thanks!
> > Jonathan
> >
> > This message is for the designated recipient only and may contain 
> > privileged, proprietary, or otherwise private information.  If you have 
> > received it in error, please notify the sender immediately and delete the 
> > original.  Any other use of the email by you is prohibited.
> >
> 
> 
> 
> -- 
> Harsh J
                                          

Reply via email to