[
https://issues.apache.org/jira/browse/HDDS-12423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Attila Doroszlai updated HDDS-12423:
------------------------------------
Status: Patch Available (was: Open)
> Use inner view of DatanodeDetail in setNodeOperationalState
> -----------------------------------------------------------
>
> Key: HDDS-12423
> URL: https://issues.apache.org/jira/browse/HDDS-12423
> Project: Apache Ozone
> Issue Type: Improvement
> Reporter: Janus Chow
> Assignee: Janus Chow
> Priority: Major
> Labels: pull-request-available
>
> We met the following error.
> {code:java}
> 2025-02-10 10:03:19,995
> [EventQueue-HealthyReadonlyNodeForHealthyReadOnlyNodeHandler] INFO
> org.apache.hadoop.hdds.scm.node.HealthyReadOnlyNodeHandler: Datanode
> 4570a118-82ab-44fe-98c6-fad28dc9f622{ip: 10.169.59.142, host:
> ip-10-169-59-142.idata-server.shopee.io, ports: [CLIENT_RPC=9864,
> REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9857, RATIS_SERVER=9856,
> STANDALONE=9859], networkLocation: /DC/RACK, certSerialId: null,
> persistedOpState: ENTERING_MAINTENANCE, persistedOpStateExpiryEpochSec: 0}
> moved to HEALTHY READONLY state.
> 2025-02-10 10:03:19,995
> [EventQueue-HealthyReadonlyNodeForHealthyReadOnlyNodeHandler] INFO
> org.apache.hadoop.hdds.scm.net.NetworkTopologyImpl: Added a new node:
> /DC/RACK/4570a118-82ab-44fe-98c6-fad28dc9f622
> 2025-02-10 10:03:20,470 [EventQueue-DeleteBlockStatusForDeletedBlockLogImpl]
> WARN org.apache.hadoop.hdds.scm.block.DeletedBlockLogImpl: Skip commit
> transactions since current SCM is not leader.
> 2025-02-10 10:03:20,756 [IPC Server handler 76 on default port 9861] INFO
> org.apache.hadoop.hdds.scm.node.SCMNodeManager: Update the operationalState
> saved in follower SCM for 4570a118-82ab-44fe-98c6-fad28dc9f622{ip:
> 10.169.59.142, host: ip-10-169-59-142.idata-server.shopee.io, ports:
> [CLIENT_RPC=9864, REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9857,
> RATIS_SERVER=9856, STANDALONE=9859], networkLocation: /default-rack,
> certSerialId: null, persistedOpState: IN_SERVICE,
> persistedOpStateExpiryEpochSec: 0} as the reported value does not match the
> value stored in SCM (ENTERING_MAINTENANCE, 0)
> 2025-02-10 10:03:20,756
> [EventQueue-HealthyReadonlyNodeForHealthyReadOnlyNodeHandler] INFO
> org.apache.hadoop.hdds.scm.node.HealthyReadOnlyNodeHandler: Datanode
> 4570a118-82ab-44fe-98c6-fad28dc9f622{ip: 10.169.59.142, host:
> ip-10-169-59-142.idata-server.shopee.io, ports: [CLIENT_RPC=9864,
> REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9857, RATIS_SERVER=9856,
> STANDALONE=9859], networkLocation: /default-rack, certSerialId: null,
> persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0} moved to
> HEALTHY READONLY state.
> 2025-02-10 10:03:20,756
> [EventQueue-HealthyReadonlyNodeForHealthyReadOnlyNodeHandler] ERROR
> org.apache.hadoop.hdds.server.events.SingleThreadExecutor: Error on execution
> message 4570a118-82ab-44fe-98c6-fad28dc9f622{ip: 10.169.59.142, host:
> ip-10-169-59-142.idata-server.shopee.io, ports: [CLIENT_RPC=9864,
> REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9857, RATIS_SERVER=9856,
> STANDALONE=9859], networkLocation: /default-rack, certSerialId: null,
> persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0}
> org.apache.hadoop.hdds.scm.net.NetworkTopology$InvalidTopologyException:
> Failed to add /default-rack/ip-10-169-59-142.idata-server.shopee.io: Its path
> depth is not 4
> at
> org.apache.hadoop.hdds.scm.net.NetworkTopologyImpl.add(NetworkTopologyImpl.java:101)
> at
> org.apache.hadoop.hdds.scm.node.HealthyReadOnlyNodeHandler.onMessage(HealthyReadOnlyNodeHandler.java:75)
> at
> org.apache.hadoop.hdds.scm.node.HealthyReadOnlyNodeHandler.onMessage(HealthyReadOnlyNodeHandler.java:39)
> at
> org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748) {code}
> The erorr is caused by incorrect Rack information reported by Datanode's
> heartbeat, instead we should use SCM's view of the Datanode here.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]