duongkame opened a new pull request, #6968:
URL: https://github.com/apache/ozone/pull/6968
## What changes were proposed in this pull request?
The major concurrent problem with KeyOutputStream happens when a block is
full and gets closed. The current implementation from HDDS-9844 only waits for
the `lastFlushFuture` before closing and cleaning up all resources (including
`xceiverClient`. When `hsync` is called concurrently, `lastFlushFuture`
completion does not guarantee that all the preceeded flushFutures complete.
This causes some interesting errors like:
- BOS is closed before watchForCommit.
```
java.io.IOException: BlockOutputStream has been closed.
at
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.checkOpen(BlockOutputStream.java:779)
at
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.watchForCommit(BlockOutputStream.java:474)
at
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$watchForCommitAsync$3(BlockOutputStream.java:693)
```
- BOS/CommitWatcher is cleaned during watchForCommit.
```
java.util.concurrent.ExecutionException: java.lang.IllegalStateException
at
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
at
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.handleFlushInternal(BlockOutputStream.java:633)
at
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.handleFlush(BlockOutputStream.java:607)
at
org.apache.hadoop.hdds.scm.storage.RatisBlockOutputStream.hsync(RatisBlockOutputStream.java:134)
at
org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.lambda$hsync$0(BlockOutputStreamEntry.java:164)
at org.apache.hadoop.util.MetricUtil.captureLatencyNs(MetricUtil.java:77)
at
org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.hsync(BlockOutputStreamEntry.java:164)
at
org.apache.hadoop.ozone.client.io.KeyOutputStream.handleStreamAction(KeyOutputStream.java:534)
at
org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:497)
at
org.apache.hadoop.ozone.client.io.KeyOutputStream.hsync(KeyOutputStream.java:463)
at
org.apache.hadoop.ozone.client.io.OzoneOutputStream.hsync(OzoneOutputStream.java:118)
at
org.apache.hadoop.hdds.tracing.TracingUtil.executeInSpan(TracingUtil.java:184)
at
org.apache.hadoop.hdds.tracing.TracingUtil.executeInNewSpan(TracingUtil.java:149)
at
org.apache.hadoop.fs.ozone.OzoneFSOutputStream.hsync(OzoneFSOutputStream.java:80)
at
org.apache.hadoop.fs.FSDataOutputStream.hsync(FSDataOutputStream.java:145)
at
org.apache.hadoop.fs.ozone.TestHSync.lambda$runConcurrentWriteHSync$9(TestHSync.java:695)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.IllegalStateException
at
com.google.common.base.Preconditions.checkState(Preconditions.java:496)
at
org.apache.hadoop.hdds.scm.storage.AbstractCommitWatcher.watchForCommit(AbstractCommitWatcher.java:146)
at
org.apache.hadoop.hdds.scm.storage.RatisBlockOutputStream.sendWatchForCommit(RatisBlockOutputStream.java:105)
at
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.watchForCommit(BlockOutputStream.java:477)
at
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$watchForCommitAsync$3(BlockOutputStream.java:693)
at
java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:670)
....
```
## What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-11193
## How was this patch tested?
Run TestHSync#testConcurrentWriteHSync repeatedly 100 times.
Before: https://github.com/apache/ozone/actions/runs/9961420355
After: https://github.com/duongkame/ozone/actions/runs/10002033074
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]