Espen Amble Kolstad wrote:
On Tuesday 27 March 2007 09:27:58 Andrzej Bialecki wrote:
Espen Amble Kolstad wrote:
Hi,
I'm trying to decommission a node with hadoop-0.12.2.
I use the property dfs.hosts.exclude, since the command haddop
dfsadmin -decommission seems to be gone.
I then start the cluster with an emtpy exclude-file, add the name of the
node to decommission and run hadoop dfsadmin -refreshNodes.
The log then says:
2007-03-27 08:42:59,168 INFO fs.FSNamesystem - Start Decommissioning
node 81.93.168.215:50010
But nothing happens.
I've left it in this state over night, but still nothing.
Am I missing something ?
What does the dfsadmin -report says about this node? It takes time to
ensure that all blocks are replicated from this node to other nodes.
Hi,
dfsadmin -report:
Name: 81.93.168.215:50010
State : Decommission in progress
Total raw bytes: 1438871724032 (1.30 TB)
Used raw bytes: 270070137404 (0.24 TB)
% used: 18.76%
Last contact: Tue Mar 27 09:42:26 CEST 2007
In the web-interface (dfshealth.jsp) no change can be seen in % or the number
of blocks on any of the nodes.
You may want to check the datanode logs if there are any exceptions
reported.. Also, things are taking time - I believe the datanodes
synchronize their block information piecewise, so that they don't
overwhelm the namenode. It surely takes some time in my case, even
though the disk size per node that I use is much smaller.
Regarding the number of blocks - if all blocks are already present on
other datanodes at least in 1 copy, then no new blocks need to be
created - I'm not sure when the namenode decides that these blocks
should get additional replicas: during the decommissioning or after it's
complete ...
It would be nice to have a progress meter on the decommissioning
process, though.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com