[
https://issues.apache.org/jira/browse/HDDS-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HDDS-6109:
---------------------------------
Labels: pull-request-available (was: )
> Ozone Client should retry unflushed buffers on new pipeline on GroupMismatch
> Exception.
> ---------------------------------------------------------------------------------------
>
> Key: HDDS-6109
> URL: https://issues.apache.org/jira/browse/HDDS-6109
> Project: Apache Ozone
> Issue Type: Bug
> Components: Ozone Client
> Reporter: Ritesh H Shukla
> Assignee: Ritesh H Shukla
> Priority: Major
> Labels: pull-request-available
>
> Currently, if the pipeline is closed in between a write the client gets a
> Mismatch Exception which results in a exception using the client.
> https://github.com/kerneltime/ozone/blob/a43735eba7a2eea7769ea146a136aebae3b8b84b/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestContainerStateMachineFailures.java#L175-L284
> {code}
> 2021-12-14 14:38:49,683 [Command processor thread] INFO
> server.RaftServer$Division (ServerState.java:close(419)) -
> 2d07f9d1-28a1-49bc-a902-d2a1291cbdf1@group-89F59A98FF87: closes. applyIndex: 2
> 2021-12-14 14:38:49,683
> [2d07f9d1-28a1-49bc-a902-d2a1291cbdf1@group-89F59A98FF87-SegmentedRaftLogWorker]
> INFO segmented.SegmentedRaftLogWorker (SegmentedRaftLogWorker.java:run(327))
> -
> 2d07f9d1-28a1-49bc-a902-d2a1291cbdf1@group-89F59A98FF87-SegmentedRaftLogWorker
> was interrupted, exiting. There are 0 tasks remaining in the queue.
> 2021-12-14 14:38:49,686 [Command processor thread] INFO
> segmented.SegmentedRaftLogWorker (SegmentedRaftLogWorker.java:close(237)) -
> 2d07f9d1-28a1-49bc-a902-d2a1291cbdf1@group-89F59A98FF87-SegmentedRaftLogWorker
> close()
> 2021-12-14 14:38:49,691 [Command processor thread] INFO
> server.RaftServer$Division (RaftServerImpl.java:groupRemove(382)) -
> 2d07f9d1-28a1-49bc-a902-d2a1291cbdf1@group-89F59A98FF87: Succeed to remove
> RaftStorageDirectory Storage Directory
> /Users/ritesh/IdeaProjects/ozone/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-4ef3409b-a4e4-4564-b417-667c302b8de2/datanode-1/data/ratis/pipelineXXX
> 2021-12-14 14:38:49,691 [Command processor thread] INFO
> commandhandler.ClosePipelineCommandHandler
> (ClosePipelineCommandHandler.java:handle(78)) - Close Pipeline
> PipelineID=pipelineXXX command on datanode
> 2d07f9d1-28a1-49bc-a902-d2a1291cbdf1.
> 2021-12-14 14:38:49,728 [EventQueue-PipelineReportForPipelineReportHandler]
> INFO pipeline.PipelineReportHandler
> (PipelineReportHandler.java:processPipelineReport(113)) - Reported pipeline
> PipelineID=pipelineXXX is not found
> 2021-12-14 14:38:51,926 [Listener at 127.0.0.1/52003] WARN
> scm.XceiverClientRatis (XceiverClientRatis.java:watchForCommit(266)) - 3 way
> commit failed on pipeline Pipeline[ Id: pipelineXXX, Nodes:
> 8c998abc-6bf8-426d-ae41-6d32c225dbb3\{ip: 192.168.86.246, host: 21884.lan,
> ports: [REPLICATION=52022, RATIS=52023, RATIS_ADMIN=52023,
> RATIS_SERVER=52023, STANDALONE=52024], networkLocation: /default-rack,
> certSerialId: null, persistedOpState: IN_SERVICE,
> persistedOpStateExpiryEpochSec: 0}82f2254c-9af0-4452-9f3a-881c3df8ce31\{ip:
> 192.168.86.246, host: 21884.lan, ports: [REPLICATION=52016, RATIS=52017,
> RATIS_ADMIN=52017, RATIS_SERVER=52017, STANDALONE=52018], networkLocation:
> /default-rack, certSerialId: null, persistedOpState: IN_SERVICE,
> persistedOpStateExpiryEpochSec: 0}2d07f9d1-28a1-49bc-a902-d2a1291cbdf1\{ip:
> 192.168.86.246, host: 21884.lan, ports: [REPLICATION=52019, RATIS=52020,
> RATIS_ADMIN=52020, RATIS_SERVER=52020, STANDALONE=52021], networkLocation:
> /default-rack, certSerialId: null, persistedOpState: IN_SERVICE,
> persistedOpStateExpiryEpochSec: 0}, ReplicationConfig: RATIS/THREE,
> State:OPEN, leaderId:82f2254c-9af0-4452-9f3a-881c3df8ce31,
> CreationTimestamp2021-12-14T14:38:39.305-08:00[America/Los_Angeles]]
> java.util.concurrent.ExecutionException:
> org.apache.ratis.protocol.exceptions.RaftRetryFailureException: Failed
> RaftClientRequest:client-214E4F4A64F9->8c998abc-6bf8-426d-ae41-6d32c225dbb3@group-89F59A98FF87,
> cid=37, seq=0, Watch-ALL_COMMITTED(6), null for 2 attempts with
> RequestTypeDependentRetryPolicy\{WRITE->org.apache.ratis.retry.ExceptionDependentRetry@7754720f,
> WATCH->org.apache.ratis.retry.ExceptionDependentRetry@514c16e5}
> at
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
> at
> org.apache.hadoop.hdds.scm.XceiverClientRatis.watchForCommit(XceiverClientRatis.java:263)
> at
> org.apache.hadoop.hdds.scm.storage.CommitWatcher.watchForCommit(CommitWatcher.java:199)
> at
> org.apache.hadoop.hdds.scm.storage.CommitWatcher.watchOnLastIndex(CommitWatcher.java:166)
> at
> org.apache.hadoop.hdds.scm.storage.RatisBlockOutputStream.sendWatchForCommit(RatisBlockOutputStream.java:101)
> at
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.watchForCommit(BlockOutputStream.java:373)
> at
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.handleFlush(BlockOutputStream.java:533)
> at
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.close(BlockOutputStream.java:547)
> at
> org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.close(BlockOutputStreamEntry.java:137)
> at
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleStreamAction(KeyOutputStream.java:495)
> at
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:469)
> at
> org.apache.hadoop.ozone.client.io.KeyOutputStream.close(KeyOutputStream.java:522)
> at
> org.apache.hadoop.ozone.client.io.OzoneOutputStream.close(OzoneOutputStream.java:61)
> at
> org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailures.testContainerStateMachineTransitionOnUnhealthyReplicas(TestContainerStateMachineFailures.java:225)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> at
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> at
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> at
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
> at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
> at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
> at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
> at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
> at
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
> at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
> at
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69)
> at
> com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
> at
> com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:235)
> at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:54)
> Caused by: org.apache.ratis.protocol.exceptions.RaftRetryFailureException:
> Failed
> RaftClientRequest:client-214E4F4A64F9->8c998abc-6bf8-426d-ae41-6d32c225dbb3@group-89F59A98FF87,
> cid=37, seq=0, Watch-ALL_COMMITTED(6), null for 2 attempts with
> RequestTypeDependentRetryPolicy\{WRITE->org.apache.ratis.retry.ExceptionDependentRetry@7754720f,
> WATCH->org.apache.ratis.retry.ExceptionDependentRetry@514c16e5}
> at
> org.apache.ratis.client.impl.RaftClientImpl.noMoreRetries(RaftClientImpl.java:272)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]