[ 
https://issues.apache.org/jira/browse/HDDS-12423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai updated HDDS-12423:
------------------------------------
    Status: Patch Available  (was: Open)

> Use inner view of DatanodeDetail in setNodeOperationalState
> -----------------------------------------------------------
>
>                 Key: HDDS-12423
>                 URL: https://issues.apache.org/jira/browse/HDDS-12423
>             Project: Apache Ozone
>          Issue Type: Improvement
>            Reporter: Janus Chow
>            Assignee: Janus Chow
>            Priority: Major
>              Labels: pull-request-available
>
> We met the following error.
> {code:java}
> 2025-02-10 10:03:19,995 
> [EventQueue-HealthyReadonlyNodeForHealthyReadOnlyNodeHandler] INFO 
> org.apache.hadoop.hdds.scm.node.HealthyReadOnlyNodeHandler: Datanode 
> 4570a118-82ab-44fe-98c6-fad28dc9f622{ip: 10.169.59.142, host: 
> ip-10-169-59-142.idata-server.shopee.io, ports: [CLIENT_RPC=9864, 
> REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9857, RATIS_SERVER=9856, 
> STANDALONE=9859], networkLocation: /DC/RACK, certSerialId: null, 
> persistedOpState: ENTERING_MAINTENANCE, persistedOpStateExpiryEpochSec: 0} 
> moved to HEALTHY READONLY state.
> 2025-02-10 10:03:19,995 
> [EventQueue-HealthyReadonlyNodeForHealthyReadOnlyNodeHandler] INFO 
> org.apache.hadoop.hdds.scm.net.NetworkTopologyImpl: Added a new node: 
> /DC/RACK/4570a118-82ab-44fe-98c6-fad28dc9f622
> 2025-02-10 10:03:20,470 [EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] 
> WARN org.apache.hadoop.hdds.scm.block.DeletedBlockLogImpl: Skip commit 
> transactions since current SCM is not leader.
> 2025-02-10 10:03:20,756 [IPC Server handler 76 on default port 9861] INFO 
> org.apache.hadoop.hdds.scm.node.SCMNodeManager: Update the operationalState 
> saved in follower SCM for 4570a118-82ab-44fe-98c6-fad28dc9f622{ip: 
> 10.169.59.142, host: ip-10-169-59-142.idata-server.shopee.io, ports: 
> [CLIENT_RPC=9864, REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9857, 
> RATIS_SERVER=9856, STANDALONE=9859], networkLocation: /default-rack, 
> certSerialId: null, persistedOpState: IN_SERVICE, 
> persistedOpStateExpiryEpochSec: 0} as the reported value does not match the 
> value stored in SCM (ENTERING_MAINTENANCE, 0)
> 2025-02-10 10:03:20,756 
> [EventQueue-HealthyReadonlyNodeForHealthyReadOnlyNodeHandler] INFO 
> org.apache.hadoop.hdds.scm.node.HealthyReadOnlyNodeHandler: Datanode 
> 4570a118-82ab-44fe-98c6-fad28dc9f622{ip: 10.169.59.142, host: 
> ip-10-169-59-142.idata-server.shopee.io, ports: [CLIENT_RPC=9864, 
> REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9857, RATIS_SERVER=9856, 
> STANDALONE=9859], networkLocation: /default-rack, certSerialId: null, 
> persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0} moved to 
> HEALTHY READONLY state.
> 2025-02-10 10:03:20,756 
> [EventQueue-HealthyReadonlyNodeForHealthyReadOnlyNodeHandler] ERROR 
> org.apache.hadoop.hdds.server.events.SingleThreadExecutor: Error on execution 
> message 4570a118-82ab-44fe-98c6-fad28dc9f622{ip: 10.169.59.142, host: 
> ip-10-169-59-142.idata-server.shopee.io, ports: [CLIENT_RPC=9864, 
> REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9857, RATIS_SERVER=9856, 
> STANDALONE=9859], networkLocation: /default-rack, certSerialId: null, 
> persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0}
> org.apache.hadoop.hdds.scm.net.NetworkTopology$InvalidTopologyException: 
> Failed to add /default-rack/ip-10-169-59-142.idata-server.shopee.io: Its path 
> depth is not 4
>     at 
> org.apache.hadoop.hdds.scm.net.NetworkTopologyImpl.add(NetworkTopologyImpl.java:101)
>     at 
> org.apache.hadoop.hdds.scm.node.HealthyReadOnlyNodeHandler.onMessage(HealthyReadOnlyNodeHandler.java:75)
>     at 
> org.apache.hadoop.hdds.scm.node.HealthyReadOnlyNodeHandler.onMessage(HealthyReadOnlyNodeHandler.java:39)
>     at 
> org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748) {code}
> The erorr is caused by incorrect Rack information reported by Datanode's 
> heartbeat, instead we should use SCM's view of the Datanode here.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to