[ 
https://issues.apache.org/jira/browse/HDDS-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16914488#comment-16914488
 ] 

Anu Engineer commented on HDDS-2026:
------------------------------------

The first cut proposed fix, solves the problem, but it does by serializing all 
reads and writes. We can fix this issue if we need by adding a reader,writer 
lock per chunk file if needed. So that concurrent reads and writes can happen 
in parallel. In the ozone case, writes are not visible to readers until we 
commit the metadata, so concurrent read and write will NOT happen. If we assume 
that is true, then readers can read without holding a lock. Let us chat about 
both these options.

> Overlapping chunk region cannot be read concurrently
> ----------------------------------------------------
>
>                 Key: HDDS-2026
>                 URL: https://issues.apache.org/jira/browse/HDDS-2026
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>          Components: Ozone Datanode
>            Reporter: Doroszlai, Attila
>            Priority: Critical
>         Attachments: HDDS-2026-repro.patch, first-cut-proposed.diff
>
>
> Concurrent requests to datanode for the same chunk may result in the 
> following exception in datanode:
> {code}
> java.nio.channels.OverlappingFileLockException
>    at java.base/sun.nio.ch.FileLockTable.checkList(FileLockTable.java:229)
>    at java.base/sun.nio.ch.FileLockTable.add(FileLockTable.java:123)
>    at 
> java.base/sun.nio.ch.AsynchronousFileChannelImpl.addToFileLockTable(AsynchronousFileChannelImpl.java:178)
>    at 
> java.base/sun.nio.ch.SimpleAsynchronousFileChannelImpl.implLock(SimpleAsynchronousFileChannelImpl.java:185)
>    at 
> java.base/sun.nio.ch.AsynchronousFileChannelImpl.lock(AsynchronousFileChannelImpl.java:118)
>    at 
> org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.readData(ChunkUtils.java:175)
>    at 
> org.apache.hadoop.ozone.container.keyvalue.impl.ChunkManagerImpl.readChunk(ChunkManagerImpl.java:213)
>    at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleReadChunk(KeyValueHandler.java:574)
>    at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:195)
>    at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:271)
>    at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:148)
>    at 
> org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:73)
>    at 
> org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:61)
> {code}
> It seems this is covered by retry logic, as key read is eventually successful 
> at client side.
> The problem is that:
> bq. File locks are held on behalf of the entire Java virtual machine. They 
> are not suitable for controlling access to a file by multiple threads within 
> the same virtual machine. 
> ([source|https://docs.oracle.com/javase/8/docs/api/java/nio/channels/FileLock.html])
> code ref: 
> [{{ChunkUtils.readData}}|https://github.com/apache/hadoop/blob/c92de8209d1c7da9e7ce607abeecb777c4a52c6a/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/helpers/ChunkUtils.java#L175]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to