[
https://issues.apache.org/jira/browse/HDDS-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16914521#comment-16914521
]
Anu Engineer commented on HDDS-2026:
------------------------------------
No locking version in the changes.diff. We rely on the semantics of Ozone here,
where the chunks by definition are never overwritten and writes are visible
only when we commit the metadata; which guarantees that read threads never
will have concurrent access to chunk files;which allows us to read without
locks since chunk files are immutable.
Delete is not an issue if the file is open, OS will keep it around till we are
done.
> Overlapping chunk region cannot be read concurrently
> ----------------------------------------------------
>
> Key: HDDS-2026
> URL: https://issues.apache.org/jira/browse/HDDS-2026
> Project: Hadoop Distributed Data Store
> Issue Type: Bug
> Components: Ozone Datanode
> Reporter: Doroszlai, Attila
> Priority: Critical
> Attachments: HDDS-2026-repro.patch, changes.diff,
> first-cut-proposed.diff
>
>
> Concurrent requests to datanode for the same chunk may result in the
> following exception in datanode:
> {code}
> java.nio.channels.OverlappingFileLockException
> at java.base/sun.nio.ch.FileLockTable.checkList(FileLockTable.java:229)
> at java.base/sun.nio.ch.FileLockTable.add(FileLockTable.java:123)
> at
> java.base/sun.nio.ch.AsynchronousFileChannelImpl.addToFileLockTable(AsynchronousFileChannelImpl.java:178)
> at
> java.base/sun.nio.ch.SimpleAsynchronousFileChannelImpl.implLock(SimpleAsynchronousFileChannelImpl.java:185)
> at
> java.base/sun.nio.ch.AsynchronousFileChannelImpl.lock(AsynchronousFileChannelImpl.java:118)
> at
> org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.readData(ChunkUtils.java:175)
> at
> org.apache.hadoop.ozone.container.keyvalue.impl.ChunkManagerImpl.readChunk(ChunkManagerImpl.java:213)
> at
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleReadChunk(KeyValueHandler.java:574)
> at
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:195)
> at
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:271)
> at
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:148)
> at
> org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:73)
> at
> org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:61)
> {code}
> It seems this is covered by retry logic, as key read is eventually successful
> at client side.
> The problem is that:
> bq. File locks are held on behalf of the entire Java virtual machine. They
> are not suitable for controlling access to a file by multiple threads within
> the same virtual machine.
> ([source|https://docs.oracle.com/javase/8/docs/api/java/nio/channels/FileLock.html])
> code ref:
> [{{ChunkUtils.readData}}|https://github.com/apache/hadoop/blob/c92de8209d1c7da9e7ce607abeecb777c4a52c6a/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/helpers/ChunkUtils.java#L175]
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]