Ritesh H Shukla created HDDS-6109:
-------------------------------------
Summary: Ozone Client should retry unflushed buffers on new
pipeline on GroupMismatch Exception.
Key: HDDS-6109
URL: https://issues.apache.org/jira/browse/HDDS-6109
Project: Apache Ozone
Issue Type: Bug
Components: Ozone Client
Reporter: Ritesh H Shukla
Assignee: Ritesh H Shukla
Currently, if the pipeline is closed in between a write the client gets a
Mismatch Exception which results in a exception using the client.
https://github.com/kerneltime/ozone/blob/a43735eba7a2eea7769ea146a136aebae3b8b84b/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestContainerStateMachineFailures.java#L175-L284
{quote}2021-12-14 14:38:49,683 [Command processor thread] INFO
server.RaftServer$Division (ServerState.java:close(419)) -
2d07f9d1-28a1-49bc-a902-d2a1291cbdf1@group-89F59A98FF87: closes. applyIndex: 2
2021-12-14 14:38:49,683
[2d07f9d1-28a1-49bc-a902-d2a1291cbdf1@group-89F59A98FF87-SegmentedRaftLogWorker]
INFO segmented.SegmentedRaftLogWorker (SegmentedRaftLogWorker.java:run(327)) -
2d07f9d1-28a1-49bc-a902-d2a1291cbdf1@group-89F59A98FF87-SegmentedRaftLogWorker
was interrupted, exiting. There are 0 tasks remaining in the queue.
2021-12-14 14:38:49,686 [Command processor thread] INFO
segmented.SegmentedRaftLogWorker (SegmentedRaftLogWorker.java:close(237)) -
2d07f9d1-28a1-49bc-a902-d2a1291cbdf1@group-89F59A98FF87-SegmentedRaftLogWorker
close()
2021-12-14 14:38:49,691 [Command processor thread] INFO
server.RaftServer$Division (RaftServerImpl.java:groupRemove(382)) -
2d07f9d1-28a1-49bc-a902-d2a1291cbdf1@group-89F59A98FF87: Succeed to remove
RaftStorageDirectory Storage Directory
/Users/ritesh/IdeaProjects/ozone/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-4ef3409b-a4e4-4564-b417-667c302b8de2/datanode-1/data/ratis/pipelineXXX
2021-12-14 14:38:49,691 [Command processor thread] INFO
commandhandler.ClosePipelineCommandHandler
(ClosePipelineCommandHandler.java:handle(78)) - Close Pipeline
PipelineID=pipelineXXX command on datanode 2d07f9d1-28a1-49bc-a902-d2a1291cbdf1.
2021-12-14 14:38:49,728 [EventQueue-PipelineReportForPipelineReportHandler]
INFO pipeline.PipelineReportHandler
(PipelineReportHandler.java:processPipelineReport(113)) - Reported pipeline
PipelineID=pipelineXXX is not found
2021-12-14 14:38:51,926 [Listener at 127.0.0.1/52003] WARN
scm.XceiverClientRatis (XceiverClientRatis.java:watchForCommit(266)) - 3 way
commit failed on pipeline Pipeline[ Id: pipelineXXX, Nodes:
8c998abc-6bf8-426d-ae41-6d32c225dbb3\{ip: 192.168.86.246, host: 21884.lan,
ports: [REPLICATION=52022, RATIS=52023, RATIS_ADMIN=52023, RATIS_SERVER=52023,
STANDALONE=52024], networkLocation: /default-rack, certSerialId: null,
persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec:
0}82f2254c-9af0-4452-9f3a-881c3df8ce31\{ip: 192.168.86.246, host: 21884.lan,
ports: [REPLICATION=52016, RATIS=52017, RATIS_ADMIN=52017, RATIS_SERVER=52017,
STANDALONE=52018], networkLocation: /default-rack, certSerialId: null,
persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec:
0}2d07f9d1-28a1-49bc-a902-d2a1291cbdf1\{ip: 192.168.86.246, host: 21884.lan,
ports: [REPLICATION=52019, RATIS=52020, RATIS_ADMIN=52020, RATIS_SERVER=52020,
STANDALONE=52021], networkLocation: /default-rack, certSerialId: null,
persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0},
ReplicationConfig: RATIS/THREE, State:OPEN,
leaderId:82f2254c-9af0-4452-9f3a-881c3df8ce31,
CreationTimestamp2021-12-14T14:38:39.305-08:00[America/Los_Angeles]]
java.util.concurrent.ExecutionException:
org.apache.ratis.protocol.exceptions.RaftRetryFailureException: Failed
RaftClientRequest:client-214E4F4A64F9->8c998abc-6bf8-426d-ae41-6d32c225dbb3@group-89F59A98FF87,
cid=37, seq=0, Watch-ALL_COMMITTED(6), null for 2 attempts with
RequestTypeDependentRetryPolicy\{WRITE->org.apache.ratis.retry.ExceptionDependentRetry@7754720f,
WATCH->org.apache.ratis.retry.ExceptionDependentRetry@514c16e5}
at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
at
org.apache.hadoop.hdds.scm.XceiverClientRatis.watchForCommit(XceiverClientRatis.java:263)
at
org.apache.hadoop.hdds.scm.storage.CommitWatcher.watchForCommit(CommitWatcher.java:199)
at
org.apache.hadoop.hdds.scm.storage.CommitWatcher.watchOnLastIndex(CommitWatcher.java:166)
at
org.apache.hadoop.hdds.scm.storage.RatisBlockOutputStream.sendWatchForCommit(RatisBlockOutputStream.java:101)
at
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.watchForCommit(BlockOutputStream.java:373)
at
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.handleFlush(BlockOutputStream.java:533)
at
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.close(BlockOutputStream.java:547)
at
org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.close(BlockOutputStreamEntry.java:137)
at
org.apache.hadoop.ozone.client.io.KeyOutputStream.handleStreamAction(KeyOutputStream.java:495)
at
org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:469)
at
org.apache.hadoop.ozone.client.io.KeyOutputStream.close(KeyOutputStream.java:522)
at
org.apache.hadoop.ozone.client.io.OzoneOutputStream.close(OzoneOutputStream.java:61)
at
org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailures.testContainerStateMachineTransitionOnUnhealthyReplicas(TestContainerStateMachineFailures.java:225)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69)
at
com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
at
com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:235)
at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:54)
Caused by: org.apache.ratis.protocol.exceptions.RaftRetryFailureException:
Failed
RaftClientRequest:client-214E4F4A64F9->8c998abc-6bf8-426d-ae41-6d32c225dbb3@group-89F59A98FF87,
cid=37, seq=0, Watch-ALL_COMMITTED(6), null for 2 attempts with
RequestTypeDependentRetryPolicy\{WRITE->org.apache.ratis.retry.ExceptionDependentRetry@7754720f,
WATCH->org.apache.ratis.retry.ExceptionDependentRetry@514c16e5}
at
org.apache.ratis.client.impl.RaftClientImpl.noMoreRetries(RaftClientImpl.java:272)
{quote}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]