[
https://issues.apache.org/jira/browse/HDDS-2026?focusedWorklogId=301205&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301205
]
ASF GitHub Bot logged work on HDDS-2026:
----------------------------------------
Author: ASF GitHub Bot
Created on: 26/Aug/19 12:02
Start Date: 26/Aug/19 12:02
Worklog Time Spent: 10m
Work Description: adoroszlai commented on pull request #1349: HDDS-2026.
Overlapping chunk region cannot be read concurrently
URL: https://github.com/apache/hadoop/pull/1349
## What changes were proposed in this pull request?
Only allow a single read/write operation for the same path in `ChunkUtils`,
to avoid `OverlappingFileLockException` due to concurrent reads. This allows
concurrent reads/writes of separate files (as opposed to simply synchronizing
the methods). It might be improved later by storing and reusing the file lock.
Use plain `FileChannel` instead of `AsynchronousFileChannel` for reading,
too, since it was used in synchronous fashion (by calling `.get()`) anyway.
https://issues.apache.org/jira/browse/HDDS-2026
## How was this patch tested?
Added unit test.
Used improved Freon tool from #1341 to perform read of same key from
multiple threads (which revealed the bug in the first place).
```
$ ozone freon ockg -n 1 -p asdf
$ ozone sh key list vol1/bucket1
[ {
"version" : 0,
"size" : 10240,
"keyName" : "asdf/0"
...
} ]
$ ozone freon ocokr -k 'asdf/0'
...
mean rate = 164.39 calls/second
...
mean = 53.75 milliseconds
stddev = 42.11 milliseconds
median = 44.05 milliseconds
...
Total execution time (sec): 6
Failures: 0
Successful executions: 1000
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 301205)
Remaining Estimate: 0h
Time Spent: 10m
> Overlapping chunk region cannot be read concurrently
> ----------------------------------------------------
>
> Key: HDDS-2026
> URL: https://issues.apache.org/jira/browse/HDDS-2026
> Project: Hadoop Distributed Data Store
> Issue Type: Bug
> Components: Ozone Datanode
> Reporter: Doroszlai, Attila
> Assignee: Doroszlai, Attila
> Priority: Critical
> Labels: pull-request-available
> Attachments: HDDS-2026-repro.patch, changes.diff,
> first-cut-proposed.diff
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Concurrent requests to datanode for the same chunk may result in the
> following exception in datanode:
> {code}
> java.nio.channels.OverlappingFileLockException
> at java.base/sun.nio.ch.FileLockTable.checkList(FileLockTable.java:229)
> at java.base/sun.nio.ch.FileLockTable.add(FileLockTable.java:123)
> at
> java.base/sun.nio.ch.AsynchronousFileChannelImpl.addToFileLockTable(AsynchronousFileChannelImpl.java:178)
> at
> java.base/sun.nio.ch.SimpleAsynchronousFileChannelImpl.implLock(SimpleAsynchronousFileChannelImpl.java:185)
> at
> java.base/sun.nio.ch.AsynchronousFileChannelImpl.lock(AsynchronousFileChannelImpl.java:118)
> at
> org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.readData(ChunkUtils.java:175)
> at
> org.apache.hadoop.ozone.container.keyvalue.impl.ChunkManagerImpl.readChunk(ChunkManagerImpl.java:213)
> at
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleReadChunk(KeyValueHandler.java:574)
> at
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:195)
> at
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:271)
> at
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:148)
> at
> org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:73)
> at
> org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:61)
> {code}
> It seems this is covered by retry logic, as key read is eventually successful
> at client side.
> The problem is that:
> bq. File locks are held on behalf of the entire Java virtual machine. They
> are not suitable for controlling access to a file by multiple threads within
> the same virtual machine.
> ([source|https://docs.oracle.com/javase/8/docs/api/java/nio/channels/FileLock.html])
> code ref:
> [{{ChunkUtils.readData}}|https://github.com/apache/hadoop/blob/c92de8209d1c7da9e7ce607abeecb777c4a52c6a/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/helpers/ChunkUtils.java#L175]
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]