Thanks for the link.
I followed that guide, and now I have rather strange behavior. If I
have dfs.hosts set (I didn't when I wrote my last email) to an empty
file when I start the cluster, nothing happens when I refreshnodes; I
take it that's expected. If it's set it to the hosts I want to keep,
none of the datanodes come up at start up, and die with this error. On
the dfshealth page, they're all listed as dead. If instead it's empty
on startup and then I add the hosts, everyone dies when I
refreshNodes.
Thoughts? I'm running 0.18.2. (We haven't moved to java 6 here yet)
Thanks!
-- David
2008-12-04 01:18:10,909 ERROR org.apache.hadoop.dfs.DataNode:
org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.dfs.DisallowedDatanodeException: Datanode denied
communication with namenode: HOST:PORT # changed.
at
org.apache.hadoop.dfs.FSNamesystem.registerDatanode(FSNamesystem.java:1938)
at org.apache.hadoop.dfs.NameNode.register(NameNode.java:585)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
at org.apache.hadoop.ipc.Client.call(Client.java:715)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at org.apache.hadoop.dfs.$Proxy4.register(Unknown Source)
at org.apache.hadoop.dfs.DataNode.register(DataNode.java:529)
at org.apache.hadoop.dfs.DataNode.runDatanodeDaemon(DataNode.java:2960)
at org.apache.hadoop.dfs.DataNode.createDataNode(DataNode.java:2995)
at org.apache.hadoop.dfs.DataNode.main(DataNode.java:3116)
On Thu, Dec 4, 2008 at 9:12 AM, Konstantin Shvachko <[EMAIL PROTECTED]> wrote:
> Just for the reference these links:
> http://wiki.apache.org/hadoop/FAQ#17
> http://hadoop.apache.org/core/docs/r0.19.0/hdfs_user_guide.html#DFSAdmin+Command
>
> Decommissioning is not happening at once.
> -refreshNodes just starts the process, but does not complete it.
> There could be a lot of blocks on the nodes you want to decommission,
> and replication takes time.
> The progress can be monitored on the name-node web UI.
> Right after -refreshNodes on the web ui you will see the nodes you chose for
> decommission have state "Decommission In Progress" you should wait until it
> is
> changed to "Decommissioned" and then turn the node off.
>
> --Konstantin
>
>
> David Hall wrote:
>>
>> I'm starting to think I'm doing things wrong.
>>
>> I have an absolute path to dfs.hosts.exclude that includes what i want
>> decommissioned, and a dfs.hosts which includes those i want to remain
>> commissioned (this points to the slaves file).
>>
>> Nothing seems to do anything...
>>
>> What am I missing?
>>
>> -- David
>>
>> On Thu, Dec 4, 2008 at 12:48 AM, David Hall <[EMAIL PROTECTED]> wrote:
>>>
>>> Hi,
>>>
>>> I'm trying to decommission some nodes. The process I tried to follow is:
>>>
>>> 1) add them to conf/excluding (hadoop-site points there)
>>> 2) invoke hadoop dfsadmin -refreshNodes
>>>
>>> This returns immediately, so I thought it was done, so i killed off
>>> the cluster and rebooted without the new nodes, but then fsck was very
>>> unhappy...
>>>
>>> Is there some way to watch the progress of decomissioning?
>>>
>>> Thanks,
>>> -- David
>>>
>>
>