On Tuesday 27 March 2007 10:03:41 Andrzej Bialecki wrote: > Espen Amble Kolstad wrote: > > On Tuesday 27 March 2007 09:27:58 Andrzej Bialecki wrote: > >> Espen Amble Kolstad wrote: > >>> Hi, > >>> > >>> I'm trying to decommission a node with hadoop-0.12.2. > >>> I use the property dfs.hosts.exclude, since the command haddop > >>> dfsadmin -decommission seems to be gone. > >>> I then start the cluster with an emtpy exclude-file, add the name of > >>> the node to decommission and run hadoop dfsadmin -refreshNodes. > >>> The log then says: > >>> 2007-03-27 08:42:59,168 INFO fs.FSNamesystem - Start Decommissioning > >>> node 81.93.168.215:50010 > >>> > >>> But nothing happens. > >>> I've left it in this state over night, but still nothing. > >>> > >>> Am I missing something ? > >> > >> What does the dfsadmin -report says about this node? It takes time to > >> ensure that all blocks are replicated from this node to other nodes. > > > > Hi, > > > > dfsadmin -report: > > > > Name: 81.93.168.215:50010 > > State : Decommission in progress > > Total raw bytes: 1438871724032 (1.30 TB) > > Used raw bytes: 270070137404 (0.24 TB) > > % used: 18.76% > > Last contact: Tue Mar 27 09:42:26 CEST 2007 > > > > In the web-interface (dfshealth.jsp) no change can be seen in % or the > > number of blocks on any of the nodes. > > You may want to check the datanode logs if there are any exceptions > reported.. Also, things are taking time - I believe the datanodes > synchronize their block information piecewise, so that they don't > overwhelm the namenode. It surely takes some time in my case, even > though the disk size per node that I use is much smaller. > > Regarding the number of blocks - if all blocks are already present on > other datanodes at least in 1 copy, then no new blocks need to be > created - I'm not sure when the namenode decides that these blocks > should get additional replicas: during the decommissioning or after it's > complete ... > > It would be nice to have a progress meter on the decommissioning > process, though.
Hi, I have replication set to 1 for the whole hdfs, so there should not be any other replicas. I can't find any errors in my logs. And the namenode-log looks like this (at INFO level): 2007-03-27 08:42:59,168 INFO fs.FSNamesystem - Start Decommissioning node 81.93.168.215:50010 2007-03-27 09:04:48,831 INFO fs.FSNamesystem - Roll Edit Log 2007-03-27 09:04:49,500 INFO fs.FSNamesystem - Roll FSImage 2007-03-27 10:04:50,221 INFO fs.FSNamesystem - Roll Edit Log 2007-03-27 10:04:50,360 INFO fs.FSNamesystem - Roll FSImage - Espen
