I was following the steps at <http://wiki.apache.org/hadoop/FAQ#17> to do the decommission. However, you have to be patient with it since it seems to take a long time. If it took 3-5 minutes with my nodes that have no data and no jobs running, I can't imagine how long it would be for a real cluster. One thing that I had trouble with originally was the fact that it doesn't seem to work if your replication is set to be same as your number of machines (since I was just testing things, I had replication set to 2 with 2 machines, but that's not a good real-world example).
The problem I'm having though (from Jeremy's reply earlier it sounds like he misinterpreted it) isn't how long it is taking for the node to go from decommissioned to being recognized by the master as dead. Whether or not it's recognized as dead isn't something that matters for what I'm doing. The real problem is that going from the In Service to Decommissioned state is taking forever. Decommission In Progress lasts 3 to 5 minutes despite the fact that there aren't jobs or data on those nodes. If anyone else has any idea why that might be (I can see why it would take time if there are jobs or data, but not otherwise) please let me know. - Alyssa ________________________________________ From: Rob Hamilton [[email protected]] Sent: Thursday, January 22, 2009 12:26 PM To: [email protected] Subject: RE: Decommissioning Nodes I wasn't able to get decommissioning to work at all and found that just taking the node down got it out of the cluster. What version are you running and how are you initiating the decommissioning? -Rob Rob Hamilton - VP Network Operations P +1 (410) 379-2195 x 240 E [email protected] 6085 Marshalee Drive, Suite 210 Elkridge, MD 21075 -----Original Message----- From: Hargraves, Alyssa [mailto:[email protected]] Sent: Wednesday, January 21, 2009 7:35 PM To: [email protected] Subject: Decommissioning Nodes Hello Hadoop Users, I was hoping someone would be able to answer a question about node decommissioning. I have a test Hadoop cluster set up which only consists of my computer and a master node. I am looking at the removal and addition of nodes. Adding a node is nearly instant (only about 5 seconds), but removing a node by decommissioning it takes a while, and I don't understand why. Currently, the systems are running no map/reduce tasks and storing no data. DFS Health reports: 7 files and directories, 0 blocks = 7 total. Heap Size is 6.68 MB / 992.31 MB (0%) Capacity : 298.02 GB DFS Remaining : 245.79 GB DFS Used : 4 KB DFS Used% : 0 % Live Nodes : 2 Dead Nodes : 0 Node Last Contact Admin State Size (GB) Used (%) Used (%) Remaining (GB) Blocks master 0 In Service 149.01 0 122.22 0 slave 82 Decommission In Progress 149.01 0 123.58 0 However, even with nothing stored and nothing running, the decommission process takes 3 to 5 minutes, and I'm not quite sure why. There isn't any data to move anywhere, and there aren't any jobs to worry about. I am using 0.18.2. Thank you for any help in solving this, Alyssa Hargraves The information transmitted in this email is intended only for the person(s) or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this email in error, please contact the sender and permanently delete the email from any computer.
