[
https://issues.apache.org/jira/browse/HDDS-13073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Attila Doroszlai resolved HDDS-13073.
-------------------------------------
Fix Version/s: 2.1.0
Resolution: Fixed
> Checksums verifier provides wrong results as it always verifies the data of
> only one node
> -----------------------------------------------------------------------------------------
>
> Key: HDDS-13073
> URL: https://issues.apache.org/jira/browse/HDDS-13073
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Rishabh Patel
> Assignee: Rishabh Patel
> Priority: Major
> Labels: pull-request-available
> Fix For: 2.1.0
>
>
> For a 3 way Ratis replicated key, the checksums tool returns the wrong
> results.
> The checksum verification for each key on each node is always the same for
> all three replicas. i.e., all replicas fail the checksums or none fail when
> one replica should.
> The checksums verifier provides incorrect results.
>
> This can be traced down to the way the
> [pipeline|https://github.com/apache/ozone/blob/1825cdf6057ae4ac2d0bcbdcfb0bed1302054e9e/hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/debug/replicas/ChecksumVerifier.java#L62-L66]
> is created for the checksum verification.
> {code:java}
> Pipeline.Builder pipelineBuilder =
> Pipeline.newBuilder(keyLocation.getPipeline())
> .setReplicationConfig(StandaloneReplicationConfig.getInstance(ONE))
> .setNodes(Collections.singletonList(datanode))
> .setLeaderId(datanode.getID())
> .setSuggestedLeaderId(datanode.getID())
> .setReplicaIndexes(Collections.singletonMap(datanode, replicaIndex));
> {code}
>
> When a client is created using this pipeline, it is
> [cached|https://github.com/apache/ozone/blob/1825cdf6057ae4ac2d0bcbdcfb0bed1302054e9e/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/XceiverClientManager.java#L150-L161]
>
> {code:java}
> protected XceiverClientSpi getClient(Pipeline pipeline, boolean topologyAware)
> throws IOException {
> try {
> // create different client different pipeline node based on
> // network topology
> String key = getPipelineCacheKey(pipeline, topologyAware);
> return clientCache.get(key, () -> newClient(pipeline));
> } catch (Exception e) {
> throw new IOException(
> "Exception getting XceiverClient: " + e, e);
> }
> } {code}
>
> The key for the cached entry is generated in
> [getPipelineCacheKey|https://github.com/apache/ozone/blob/1825cdf6057ae4ac2d0bcbdcfb0bed1302054e9e/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/XceiverClientManager.java#L163-L165]
> {code:java}
> String key = pipeline.getId().getId().toString() + pipeline.getType(); {code}
>
> When the pipeline is created via
> {{{}Pipeline.newBuilder(keyLocation.getPipeline()){}}}, it inherits the
> original pipeline's id. This results in the first cached client being reused
> for subsequent checksums verification.
>
> Example debug log line. Note the expected node in the pipeline and the
> returned node.
> {code:java}
> 2025-05-19 06:01:18,857 [main] INFO scm.XceiverClientManager
> (XceiverClientManager.java:getClient(156)) - ATTENTION! getting possibly
> cached XceiverClient for pipeline Pipeline{ Id:
> c74e00f6-66d0-4bd0-9ee7-05b9c258bd5e, Nodes: [
> {91480a1b-f789-4147-922d-6790aef31cf1(localhost/127.0.0.1), ReplicaIndex:
> 0},], ReplicationConfig: STANDALONE/ONE, State:OPEN,
> leaderId:91480a1b-f789-4147-922d-6790aef31cf1,
> CreationTimestamp2025-05-19T06:01:15.106-07:00[America/Los_Angeles]} with key
> c74e00f6-66d0-4bd0-9ee7-05b9c258bd5eSTAND_ALONE
> 2025-05-19 06:01:18,857 [main] INFO scm.XceiverClientManager
> (XceiverClientManager.java:getClient(158)) - ATTENTION! returning cached
> XceiverClient for node 1bf6b67b-5816-4e0e-90b4-9c42dd2b5df7 {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]