[jira] [Updated] (HDDS-6890) EC: Fix potential wrong replica read with over-replicated container.

ASF GitHub Bot (Jira) Thu, 16 Jun 2022 02:07:07 -0700


     [ 
https://issues.apache.org/jira/browse/HDDS-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ASF GitHub Bot updated HDDS-6890:
---------------------------------
    Labels: pull-request-available  (was: )

> EC: Fix potential wrong replica read with over-replicated container.
> --------------------------------------------------------------------
>
>                 Key: HDDS-6890
>                 URL: https://issues.apache.org/jira/browse/HDDS-6890
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Mark Gui
>            Assignee: Mark Gui
>            Priority: Major
>              Labels: pull-request-available
>
> We've encountered a potential issue on the read path when we have 
> over-replicated EC container replicas after Offline Recovery.
> Here is the scenario:
>  # 20 DNs, EC 10+4
>  # A background script keep writing and reading keys.(freon ockg & ockv)
>  # A background script that randomly kills DNs (< 4 for each round), wait for 
> all containers to recover to the expected number of replicas(14). Then go for 
> next round.
>  # After there are 8 killed DNs, restart all dead DNs before.
>  # Then we have potentially over-replicated containers.
>  # At that time, there are concurrent ockv and container delete commands in 
> the cluster.
>  # And the ockv reports a digest mismatch during the short period of several 
> minutes.
>  # But after all the deletions are done and all containers return to an 
> expected number of replicas(14), the ockv command returns successfully.
> Outputs for the ockv command below:
> {code:java}
> 2022-06-10 13:21:27,923 [main] INFO storage.BlockInputStream: Unable to read 
> information for block conID: 12 locID: 109611004723202307 bcsId: 0 from 
> pipeline PipelineID=35db70e6-f362-4b3a-ab98-a89509547455: ContainerID 12 does 
> not exist
> 2022-06-10 13:21:28,613 [Thread-4] INFO freon.ProgressBar: Progress: 0.00 % 
> (0 out of 10)
> 2022-06-10 13:21:29,614 [Thread-4] INFO freon.ProgressBar: Progress: 0.00 % 
> (0 out of 10)
> 2022-06-10 13:21:30,614 [Thread-4] INFO freon.ProgressBar: Progress: 0.00 % 
> (0 out of 10)
> 2022-06-10 13:21:31,922 [Thread-4] INFO freon.ProgressBar: Progress: 0.00 % 
> (0 out of 10)
> 2022-06-10 13:21:32,922 [Thread-4] INFO freon.ProgressBar: Progress: 0.00 % 
> (0 out of 10)
> 2022-06-10 13:21:33,922 [Thread-4] INFO freon.ProgressBar: Progress: 0.00 % 
> (0 out of 10)
> 2022-06-10 13:21:33,978 [pool-2-thread-1] ERROR scm.XceiverClientGrpc: Failed 
> to execute command cmdType: GetBlock
> traceID: ""
> containerID: 12
> datanodeUuid: "5b542d9f-5e91-4cc2-af48-af3c33b8c472"
> getBlock {
>   blockID {
>     containerID: 12
>     localID: 109611004723202307
>     blockCommitSequenceId: 0
>   }
> }
>  on the pipeline Pipeline[ Id: 8151bbc7-672c-42d6-9c40-c80ce69d4b3f, Nodes: 
> 5b542d9f-5e91-4cc2-af48-af3c33b8c472{ip: 9.27.142.97, host: 9.27.142.97, 
> ports: [REPLICATION=3096, RATIS=3092, RATIS_ADMIN=3095, RATIS_SERVER=3094, 
> STANDALONE=3093], networkLocation: /default-rack, certSerialId: null, 
> persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0}, 
> ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, 
> CreationTimestamp2022-06-10T13:21:33.921+08:00[Asia/Shanghai]].
> 2022-06-10 13:21:34,014 [pool-2-thread-1] INFO storage.BlockInputStream: 
> Unable to read information for block conID: 12 locID: 109611004723202307 
> bcsId: 0 from pipeline PipelineID=8151bbc7-672c-42d6-9c40-c80ce69d4b3f: 
> ContainerID 12 does not exist
> 2022-06-10 13:21:34,142 [pool-2-thread-5] ERROR scm.XceiverClientGrpc: Failed 
> to execute command cmdType: GetBlock
> traceID: ""
> containerID: 12
> datanodeUuid: "5b542d9f-5e91-4cc2-af48-af3c33b8c472"
> getBlock {
>   blockID {
>     containerID: 12
>     localID: 109611004723202316
>     blockCommitSequenceId: 0
>   }
> }
>  on the pipeline Pipeline[ Id: f421fc76-eebb-4474-8c77-c57944144df6, Nodes: 
> 5b542d9f-5e91-4cc2-af48-af3c33b8c472{ip: 9.27.142.97, host: 9.27.142.97, 
> ports: [REPLICATION=3096, RATIS=3092, RATIS_ADMIN=3095, RATIS_SERVER=3094, 
> STANDALONE=3093], networkLocation: /default-rack, certSerialId: null, 
> persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0}, 
> ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, 
> CreationTimestamp2022-06-10T13:21:34.074+08:00[Asia/Shanghai]].
> 2022-06-10 13:21:34,143 [pool-2-thread-5] INFO storage.BlockInputStream: 
> Unable to read information for block conID: 12 locID: 109611004723202316 
> bcsId: 0 from pipeline PipelineID=f421fc76-eebb-4474-8c77-c57944144df6: 
> ContainerID 12 does not exist
> 2022-06-10 13:21:34,315 [pool-2-thread-7] ERROR scm.XceiverClientGrpc: Failed 
> to execute command cmdType: GetBlock
> traceID: ""
> containerID: 12
> datanodeUuid: "5b542d9f-5e91-4cc2-af48-af3c33b8c472"
> getBlock {
>   blockID {
>     containerID: 12
>     localID: 109611004723202310
>     blockCommitSequenceId: 0
>   }
> }
>  on the pipeline Pipeline[ Id: 21b59767-0ea9-4214-9f56-61fd31d795f2, Nodes: 
> 5b542d9f-5e91-4cc2-af48-af3c33b8c472{ip: 9.27.142.97, host: 9.27.142.97, 
> ports: [REPLICATION=3096, RATIS=3092, RATIS_ADMIN=3095, RATIS_SERVER=3094, 
> STANDALONE=3093], networkLocation: /default-rack, certSerialId: null, 
> persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0}, 
> ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, 
> CreationTimestamp2022-06-10T13:21:34.215+08:00[Asia/Shanghai]].
> 2022-06-10 13:21:34,315 [pool-2-thread-7] INFO storage.BlockInputStream: 
> Unable to read information for block conID: 12 locID: 109611004723202310 
> bcsId: 0 from pipeline PipelineID=21b59767-0ea9-4214-9f56-61fd31d795f2: 
> ContainerID 12 does not exist
> 2022-06-10 13:21:35,402 [Thread-4] INFO freon.ProgressBar: Progress: 0.00 % 
> (0 out of 10)
> 2022-06-10 13:21:36,418 [Thread-4] INFO freon.ProgressBar: Progress: 0.00 % 
> (0 out of 10)
> 2022-06-10 13:21:37,419 [Thread-4] INFO freon.ProgressBar: Progress: 0.00 % 
> (0 out of 10)
> 2022-06-10 13:21:38,001 [pool-2-thread-5] ERROR freon.BaseFreonGenerator: 
> Error on executing task 4
> java.lang.IllegalStateException: Reference (=first) message digest doesn't 
> match with digest of stripe_align_1654804614/4
>         at 
> org.apache.hadoop.ozone.freon.OzoneClientKeyValidator.validateDigest(OzoneClientKeyValidator.java:169)
>         at 
> org.apache.hadoop.ozone.freon.OzoneClientKeyValidator.validateKey(OzoneClientKeyValidator.java:133)
>         at 
> org.apache.hadoop.ozone.freon.BaseFreonGenerator.tryNextTask(BaseFreonGenerator.java:184)
>         at 
> org.apache.hadoop.ozone.freon.BaseFreonGenerator.taskLoop(BaseFreonGenerator.java:164)
>         at 
> org.apache.hadoop.ozone.freon.BaseFreonGenerator.lambda$startTaskRunners$1(BaseFreonGenerator.java:147)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> 2022-06-10 13:21:38,419 [Thread-4] INFO freon.ProgressBar: Progress: 10.00 % 
> (1 out of 10)
> 2022-06-10 13:21:39,419 [Thread-4] INFO freon.ProgressBar: Progress: 100.00 % 
> (10 out of 10)
> One ore more freon test is failed.
> 2022-06-10 13:21:39,823 [shutdown-hook-0] INFO metrics: type=TIMER, 
> name=key-validate, count=10, min=5918.240321, max=7710.826968, 
> mean=6891.752193143025, stddev=481.39096683740087, median=6798.957973, 
> p75=7302.544457, p95=7710.826968, p98=7710.826968, p99=7710.826968, 
> p999=7710.826968, mean_rate=1.1586365637687632, m1=0.0, m5=0.0, m15=0.0, 
> rate_unit=events/second, duration_unit=milliseconds
> 2022-06-10 13:21:39,824 [shutdown-hook-0] INFO freon.BaseFreonGenerator: 
> Total execution time (sec): 22
> 2022-06-10 13:21:39,824 [shutdown-hook-0] INFO freon.BaseFreonGenerator: 
> Failures: 1
> 2022-06-10 13:21:39,824 [shutdown-hook-0] INFO freon.BaseFreonGenerator: 
> Successful executions: 9
>  [check_object] size=52428800 prefix=stripe_align_1654804614 fail!!!! {code}
> I've got some analysis for this problem below:
> The key clue of this problem is the following log:
> {code:java}
> 2022-06-10 13:21:27,923 [main] INFO storage.BlockInputStream: Unable to read 
> information for block conID: 12 locID: 109611004723202307 bcsId: 0 from 
> pipeline PipelineID=35db70e6-f362-4b3a-ab98-a89509547455: ContainerID 12 does 
> not exist {code}
> This should be caused by a container delete command received before the 
> ReadChunk request reached the DN.
> And `BlockInputStream` will try to refresh the pipeline and retry read, see:
> {code:java}
> // BlockInputStream.java
> private void refreshPipeline(IOException cause) throws IOException {
>   LOG.info("Unable to read information for block {} from pipeline {}: {}",
>       blockID, pipeline.getId(), cause.getMessage());
>   if (refreshPipelineFunction != null) {
>     LOG.debug("Re-fetching pipeline for block {}", blockID);
>     Pipeline newPipeline = refreshPipelineFunction.apply(blockID);
>     if (newPipeline == null || newPipeline.sameDatanodes(pipeline)) {
>       LOG.warn("No new pipeline for block {}", blockID);
>       throw cause;
>     } else {
>       LOG.debug("New pipeline got for block {}", blockID);
>       this.pipeline = newPipeline;
>     }
>   } else {
>     throw cause;
>   }
> } {code}
> But actually here the pipeline we try to refresh from SCM is a {*}Standalone 
> pipeline{*}, so after SCM got this request via the SCM interface 
> `getContainerWithPipelineBatch`, it will return a EC pipeline with all the 
> DNs for this container in the class `ECPipelineProvider`, see:
> {code:java}
> // ECPipelineProvider.java
> @Override public Pipeline createForRead(
>     ECReplicationConfig replicationConfig,
>     Set<ContainerReplica> replicas) {
>   Map<DatanodeDetails, Integer> map = new HashMap<>();
>   List<DatanodeDetails> dns = new ArrayList<>(replicas.size());
>   for (ContainerReplica r : replicas) {
>     map.put(r.getDatanodeDetails(), r.getReplicaIndex());
>     dns.add(r.getDatanodeDetails());
>   }
>   return createPipelineInternal(replicationConfig, dns, map);
> }{code}
> So for the retry read, it will pick the first node to read regardless of the 
> replicaIndex, so we will get wrong chunks from the first node to read and 
> thus the mismatch problem above, see:
> {code:java}
> // XceiverClientGrpc.java
> public void connect() throws Exception {
>   // connect to the closest node, if closest node doesn't exist, delegate to
>   // first node, which is usually the leader in the pipeline.
>   DatanodeDetails dn = topologyAwareRead ? this.pipeline.getClosestNode() :
>       this.pipeline.getFirstNode();
>   // just make a connection to the picked datanode at the beginning
>   connectToDatanode(dn);
> }{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDDS-6890) EC: Fix potential wrong replica read with over-replicated container.

Reply via email to