[
https://issues.apache.org/jira/browse/HDDS-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883727#comment-16883727
]
Shashikant Banerjee edited comment on HDDS-1753 at 7/22/19 3:06 PM:
--------------------------------------------------------------------
The issue being caused here is as data is still to be replicated to the
followers via leader, as a result of key delete , a block in a closed container
can get deleted on the leader. When the follower asks for the chunk data from
the leader, it fails as the chunk file does not exist in the leader.
The solutions being proposed here is as follows:
1)Whenever a delete command gets received on a datanode from SCM, it should
first check the min replicated index across all the servers in the pipeline.
ContainerStateMachine will also track, the close container log index for each
container. Now, if the container is closed ans min replicated index >=
container BCSID in the leader, a delete operation will be queued over Ratis in
the leader and same will be ignored in the follower and now delete will happen
over Ratis. In case, close container index is not replicated, delete
transaction will never be enqueued over Ratis and ignored. SCM already has a
retry policy in place to retry the same delete.
In case, the Ratis pipeline does not exist, delete will work as is.
2) In this approach, whenever a delete request comes at a datanode, it should
first check the container state and get the min replicated index of the ratis
server. If the container is closed and if the min replicated index of the
server is greater than the container BCSID, delete will be executed otherwise
will fail.
Across node restarts let's say after delete happened, while reapplying the logs
if putBlocks/WriteChunks are encountered which were already deleted, these can
safely be ignored once the container state is closed and persisted across
restarts.
3) The third approach is slight deviation from the 2nd one. Thanks [~ljain]
for suggesting this.
The idea here is to allow deletes for those blocks only for which the BCSID is
lesser than or equal to last consistent point which may be your ratis purge
index or last readable snaphot index. This solves the problem of restart
because all transaction which will be reapplied after restart will only start
after the last consistent point.
was (Author: shashikant):
The issue being caused here is as data is still to be replicated to the
followers via leader, as a result of key delete , a block in a closed container
can get deleted on the leader. When the follower asks for the chunk data from
the leader, it fails as the chunk file does not exist in the leader.
The solution being proposed here is as follows:
Whenever a delete command gets received on a datanode from SCM, it should first
check the min replicated index across all the servers in the pipeline.
ContainerStateMachine will also track, the close container log index for each
cotainer. Now, if the min replicated index >= close container index in the
leader, a delete operation will be queued over Ratis in the leader and same
will be ignored in the follower and now delete will happen over Ratis. In case,
close container index is not replicated, delete transaction will never be
enqueued over Ratis and ignored. SCM already has a retry policy in place to
retry the same delete.
In case, the Ratis pipeline does not exist, delete will work as is.
> Datanode unable to find chunk while replication data using ratis.
> -----------------------------------------------------------------
>
> Key: HDDS-1753
> URL: https://issues.apache.org/jira/browse/HDDS-1753
> Project: Hadoop Distributed Data Store
> Issue Type: Bug
> Components: Ozone Datanode
> Affects Versions: 0.4.0
> Reporter: Mukul Kumar Singh
> Assignee: Shashikant Banerjee
> Priority: Major
> Labels: MiniOzoneChaosCluster
>
> Leader datanode is unable to read chunk from the datanode while replicating
> data from leader to follower.
> Please note that deletion of keys is also happening while the data is being
> replicated.
> {code}
> 2019-07-02 19:39:22,604 INFO impl.RaftServerImpl
> (RaftServerImpl.java:checkInconsistentAppendEntries(972)) -
> 5ac88709-a3a2-4c8f-91de-5e54b617f05e: inconsistency entries.
> Reply:76a3eb0f-d7cd-477b-8973-db1
> 014feb398<-5ac88709-a3a2-4c8f-91de-5e54b617f05e#70:FAIL,INCONSISTENCY,nextIndex:9771,term:2,followerCommit:9782
> 2019-07-02 19:39:22,605 ERROR impl.ChunkManagerImpl
> (ChunkUtils.java:readData(161)) - Unable to find the chunk file. chunk info :
> ChunkInfo{chunkName='76ec669ae2cb6e10dd9f08c0789c5fdf_stream_a2850dce-def3
> -4d64-93d8-fa2ebafee933_chunk_1, offset=0, len=2048}
> 2019-07-02 19:39:22,605 INFO impl.RaftServerImpl
> (RaftServerImpl.java:checkInconsistentAppendEntries(990)) -
> 5ac88709-a3a2-4c8f-91de-5e54b617f05e: Failed appendEntries as latest snapshot
> (9770) already h
> as the append entries (first index: 1)
> 2019-07-02 19:39:22,605 INFO impl.RaftServerImpl
> (RaftServerImpl.java:checkInconsistentAppendEntries(972)) -
> 5ac88709-a3a2-4c8f-91de-5e54b617f05e: inconsistency entries.
> Reply:76a3eb0f-d7cd-477b-8973-db1
> 014feb398<-5ac88709-a3a2-4c8f-91de-5e54b617f05e#71:FAIL,INCONSISTENCY,nextIndex:9771,term:2,followerCommit:9782
> 2019-07-02 19:39:22,605 INFO keyvalue.KeyValueHandler
> (ContainerUtils.java:logAndReturnError(146)) - Operation: ReadChunk : Trace
> ID: 4216d461a4679e17:4216d461a4679e17:0:0 : Message: Unable to find the c
> hunk file. chunk info
> ChunkInfo{chunkName='76ec669ae2cb6e10dd9f08c0789c5fdf_stream_a2850dce-def3-4d64-93d8-fa2ebafee933_chunk_1,
> offset=0, len=2048} : Result: UNABLE_TO_FIND_CHUNK
> 2019-07-02 19:39:22,605 INFO impl.RaftServerImpl
> (RaftServerImpl.java:checkInconsistentAppendEntries(990)) -
> 5ac88709-a3a2-4c8f-91de-5e54b617f05e: Failed appendEntries as latest snapshot
> (9770) already h
> as the append entries (first index: 2)
> 2019-07-02 19:39:22,606 INFO impl.RaftServerImpl
> (RaftServerImpl.java:checkInconsistentAppendEntries(972)) -
> 5ac88709-a3a2-4c8f-91de-5e54b617f05e: inconsistency entries.
> Reply:76a3eb0f-d7cd-477b-8973-db1
> 014feb398<-5ac88709-a3a2-4c8f-91de-5e54b617f05e#72:FAIL,INCONSISTENCY,nextIndex:9771,term:2,followerCommit:9782
> 19:39:22.606 [pool-195-thread-19] ERROR DNAudit - user=null | ip=null |
> op=READ_CHUNK {blockData=conID: 3 locID: 102372189549953034 bcsId: 0} |
> ret=FAILURE
> java.lang.Exception: Unable to find the chunk file. chunk info
> ChunkInfo{chunkName='76ec669ae2cb6e10dd9f08c0789c5fdf_stream_a2850dce-def3-4d64-93d8-fa2ebafee933_chunk_1,
> offset=0, len=2048}
> at
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:320)
> ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?]
> at
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:148)
> ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?]
> at
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:346)
> ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?]
> at
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.readStateMachineData(ContainerStateMachine.java:476)
> ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?]
> at
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$getCachedStateMachineData$2(ContainerStateMachine.java:495)
> ~[hadoop-hdds-container-service-0.5.0-SN
> APSHOT.jar:?]
> at
> com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4767)
> ~[guava-11.0.2.jar:?]
> at
> com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
> ~[guava-11.0.2.jar:?]
> at
> com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
> ~[guava-11.0.2.jar:?]
> at
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
> ~[guava-11.0.2.jar:?]
> at
> com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
> ~[guava-11.0.2.jar:?]
> at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
> ~[guava-11.0.2.jar:?]
> at
> com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764)
> ~[guava-11.0.2.jar:?]
> at
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.getCachedStateMachineData(ContainerStateMachine.java:494)
> ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.ja
> r:?]
> at
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$readStateMachineData$4(ContainerStateMachine.java:542)
> ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?]
> at
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> [?:1.8.0_171]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [?:1.8.0_171]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [?:1.8.0_171]
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]