RE: Decommission in hadoop-0.12.2

Dhruba Borthakur Tue, 27 Mar 2007 08:35:45 -0800

The decommission-in-progress state indicates that the Namenode is triggering
replication of blocks that reside on the node-being-decommissioned. When all
those blocks get replicated to another Datanode(s),then the state should
change to 'decommissioned".


You can run a bin/hdoop fsck -blocks -locations -files to list out all the
locations of all blocks in the fs (this might take lots of time depending on
the number of files). Please verify if any of the blocks that reside on the
decommission-in-progress node have 2 replicas. Once all those blocks have
two replicas (because you have set replication factor to 1), the
decommissioning should be complete.

Thanks,
dhruba


-----Original Message-----
From: Espen Amble Kolstad [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, March 27, 2007 1:23 AM
To: [email protected]
Subject: Re: Decommission in hadoop-0.12.2

On Tuesday 27 March 2007 10:03:41 Andrzej Bialecki wrote:
> Espen Amble Kolstad wrote:
> > On Tuesday 27 March 2007 09:27:58 Andrzej Bialecki wrote:
> >> Espen Amble Kolstad wrote:
> >>> Hi,
> >>>
> >>> I'm trying to decommission a node with hadoop-0.12.2.
> >>> I use the property dfs.hosts.exclude, since the command haddop
> >>> dfsadmin -decommission seems to be gone.
> >>> I then start the cluster with an emtpy exclude-file, add the name of
> >>> the node to decommission and run hadoop dfsadmin -refreshNodes.
> >>> The log then says:
> >>> 2007-03-27 08:42:59,168 INFO  fs.FSNamesystem - Start Decommissioning
> >>> node 81.93.168.215:50010
> >>>
> >>> But nothing happens.
> >>> I've left it in this state over night, but still nothing.
> >>>
> >>> Am I missing something ?
> >>
> >> What does the dfsadmin -report says about this node? It takes time to
> >> ensure that all blocks are replicated from this node to other nodes.
> >
> > Hi,
> >
> > dfsadmin -report:
> >
> > Name: 81.93.168.215:50010
> > State          : Decommission in progress
> > Total raw bytes: 1438871724032 (1.30 TB)
> > Used raw bytes: 270070137404 (0.24 TB)
> > % used: 18.76%
> > Last contact: Tue Mar 27 09:42:26 CEST 2007
> >
> > In the web-interface (dfshealth.jsp) no change can be seen in % or the
> > number of blocks on any of the nodes.
>
> You may want to check the datanode logs if there are any exceptions
> reported.. Also, things are taking time - I believe the datanodes
> synchronize their block information piecewise, so that they don't
> overwhelm the namenode. It surely takes some time in my case, even
> though the disk size per node that I use is much smaller.
>
> Regarding the number of blocks - if all blocks are already present on
> other datanodes at least in 1 copy, then no new blocks need to be
> created - I'm not sure when the namenode decides that these blocks
> should get additional replicas: during the decommissioning or after it's
> complete ...
>
> It would be nice to have a progress meter on the decommissioning
> process, though.

Hi,

I have replication set to 1 for the whole hdfs, so there should not be any 
other replicas.
I can't find any errors in my logs. And the namenode-log looks like this (at

INFO level):
2007-03-27 08:42:59,168 INFO  fs.FSNamesystem - Start Decommissioning node 
81.93.168.215:50010
2007-03-27 09:04:48,831 INFO  fs.FSNamesystem - Roll Edit Log
2007-03-27 09:04:49,500 INFO  fs.FSNamesystem - Roll FSImage
2007-03-27 10:04:50,221 INFO  fs.FSNamesystem - Roll Edit Log
2007-03-27 10:04:50,360 INFO  fs.FSNamesystem - Roll FSImage

- Espen

RE: Decommission in hadoop-0.12.2

Reply via email to