[jira] [Updated] (HDDS-9481) A reformatted datanode node cannot be decommissioned

ASF GitHub Bot (Jira) Tue, 17 Oct 2023 08:27:06 -0700


     [ 
https://issues.apache.org/jira/browse/HDDS-9481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ASF GitHub Bot updated HDDS-9481:
---------------------------------
    Labels: pull-request-available  (was: )

> A reformatted datanode node cannot be decommissioned
> ----------------------------------------------------
>
>                 Key: HDDS-9481
>                 URL: https://issues.apache.org/jira/browse/HDDS-9481
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>              Labels: pull-request-available
>
> If a datanode is registered to SCM on the cluster, and then it is stopped and 
> the data disks cleared and then it is restarted, it will reconnect to SCM as 
> a new node with a new UUID.
> When this happens, the old datanode details are kept in SCM as a dead node 
> and the mapping table which maps DNs running on a host to their UUIDs will 
> contain two entries, leaving the decommission command unable to decide which 
> entry is to be decommissioned, giving this error:
> {code}
> 2023-10-03 08:05:50,279 ERROR [IPC Server handler 25 on 
> 9860]-org.apache.hadoop.hdds.scm.server.SCMClientProtocolServer: Failed to 
> decommission nodes
> org.apache.hadoop.hdds.scm.node.InvalidHostStringException: Host 
> host1.acme.org is running multiple datanodes registered with SCM, but no port 
> numbers match. Please check the port number.
>         at 
> org.apache.hadoop.hdds.scm.node.NodeDecommissionManager.mapHostnamesToDatanodes(NodeDecommissionManager.java:151)
>         at 
> org.apache.hadoop.hdds.scm.node.NodeDecommissionManager.decommissionNodes(NodeDecommissionManager.java:228)
>         at 
> org.apache.hadoop.hdds.scm.server.SCMClientProtocolServer.decommissionNodes(SCMClientProtocolServer.java:624)
>         at 
> org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.decommissionNodes(StorageContainerLocationProtocolServerSideTranslatorPB.java:1114)
>         at 
> org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.processRequest(StorageContainerLocationProtocolServerSideTranslatorPB.java:602)
>         at 
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:87)
>         at 
> org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.submitRequest(StorageContainerLocationProtocolServerSideTranslatorPB.java:221)
>         at 
> org.apache.hadoop.hdds.protocol.proto.StorageContainerLocationProtocolProtos$StorageContainerLocationProtocolService$2.callBlockingMethod(StorageContainerLocationProtocolProtos.java)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:994)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:922)
>         at java.base/java.security.AccessController.doPrivileged(Native 
> Method)
>         at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2899)
> {code}
> It is valid for multiple DNs to be on the same host, especially on test 
> clusters or mini-clusters. However it is not possible for a DN to be 
> heartbeating from the same host with the same ports.
> In this case, where we try to decommission a host, and it has multiple 
> entries from the same host and all the ports are the same for all entries, we 
> can safely decommission the one with the newest heartbeat.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDDS-9481) A reformatted datanode node cannot be decommissioned

Reply via email to