The decommission-in-progress state indicates that the Namenode is triggering replication of blocks that reside on the node-being-decommissioned. When all those blocks get replicated to another Datanode(s),then the state should change to 'decommissioned".
You can run a bin/hdoop fsck -blocks -locations -files to list out all the locations of all blocks in the fs (this might take lots of time depending on the number of files). Please verify if any of the blocks that reside on the decommission-in-progress node have 2 replicas. Once all those blocks have two replicas (because you have set replication factor to 1), the decommissioning should be complete. Thanks, dhruba -----Original Message----- From: Espen Amble Kolstad [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 27, 2007 1:23 AM To: [email protected] Subject: Re: Decommission in hadoop-0.12.2 On Tuesday 27 March 2007 10:03:41 Andrzej Bialecki wrote: > Espen Amble Kolstad wrote: > > On Tuesday 27 March 2007 09:27:58 Andrzej Bialecki wrote: > >> Espen Amble Kolstad wrote: > >>> Hi, > >>> > >>> I'm trying to decommission a node with hadoop-0.12.2. > >>> I use the property dfs.hosts.exclude, since the command haddop > >>> dfsadmin -decommission seems to be gone. > >>> I then start the cluster with an emtpy exclude-file, add the name of > >>> the node to decommission and run hadoop dfsadmin -refreshNodes. > >>> The log then says: > >>> 2007-03-27 08:42:59,168 INFO fs.FSNamesystem - Start Decommissioning > >>> node 81.93.168.215:50010 > >>> > >>> But nothing happens. > >>> I've left it in this state over night, but still nothing. > >>> > >>> Am I missing something ? > >> > >> What does the dfsadmin -report says about this node? It takes time to > >> ensure that all blocks are replicated from this node to other nodes. > > > > Hi, > > > > dfsadmin -report: > > > > Name: 81.93.168.215:50010 > > State : Decommission in progress > > Total raw bytes: 1438871724032 (1.30 TB) > > Used raw bytes: 270070137404 (0.24 TB) > > % used: 18.76% > > Last contact: Tue Mar 27 09:42:26 CEST 2007 > > > > In the web-interface (dfshealth.jsp) no change can be seen in % or the > > number of blocks on any of the nodes. > > You may want to check the datanode logs if there are any exceptions > reported.. Also, things are taking time - I believe the datanodes > synchronize their block information piecewise, so that they don't > overwhelm the namenode. It surely takes some time in my case, even > though the disk size per node that I use is much smaller. > > Regarding the number of blocks - if all blocks are already present on > other datanodes at least in 1 copy, then no new blocks need to be > created - I'm not sure when the namenode decides that these blocks > should get additional replicas: during the decommissioning or after it's > complete ... > > It would be nice to have a progress meter on the decommissioning > process, though. Hi, I have replication set to 1 for the whole hdfs, so there should not be any other replicas. I can't find any errors in my logs. And the namenode-log looks like this (at INFO level): 2007-03-27 08:42:59,168 INFO fs.FSNamesystem - Start Decommissioning node 81.93.168.215:50010 2007-03-27 09:04:48,831 INFO fs.FSNamesystem - Roll Edit Log 2007-03-27 09:04:49,500 INFO fs.FSNamesystem - Roll FSImage 2007-03-27 10:04:50,221 INFO fs.FSNamesystem - Roll Edit Log 2007-03-27 10:04:50,360 INFO fs.FSNamesystem - Roll FSImage - Espen
