[ 
https://issues.apache.org/jira/browse/HDDS-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883727#comment-16883727
 ] 

Shashikant Banerjee edited comment on HDDS-1753 at 7/22/19 3:06 PM:
--------------------------------------------------------------------

The issue being caused here is as data is still to be replicated to the 
followers via leader, as a result of key delete , a block in a closed container 
can get deleted on the leader. When the follower asks for the chunk data from 
the leader, it fails as the chunk file does not exist in the leader.

The solutions being proposed here is as follows:

1)Whenever a delete command gets received on a datanode from SCM, it should 
first check the min replicated index across all the servers in the pipeline. 
ContainerStateMachine will also track, the close container log index for each 
container. Now, if the container is closed ans min replicated index >= 
container BCSID in the leader, a delete operation will be queued over Ratis in 
the leader and same will be ignored in the follower and now delete will happen 
over Ratis. In case, close container index is not replicated, delete 
transaction will never be enqueued over Ratis and ignored. SCM already has a 
retry policy in place to retry the same delete.
In case, the Ratis pipeline does not exist, delete will work as is.

2) In this approach, whenever a delete request comes at a datanode, it should 
first check the container state and get the min replicated index of the ratis 
server. If the container is closed and if the min replicated index of the 
server is greater than the container BCSID, delete will be executed otherwise 
will fail. 

Across node restarts let's say after delete happened, while reapplying the logs 
if putBlocks/WriteChunks are encountered which were already deleted, these can 
safely be ignored once the container state is closed and persisted across 
restarts.

3) The third approach is slight deviation from the 2nd one.  Thanks [~ljain] 
for suggesting this.
The idea here is to allow deletes for those blocks only for which the BCSID is 
lesser than or equal to last consistent point which may be your ratis purge 
index or last readable snaphot index. This solves the problem of restart 
because all transaction which will be reapplied after restart will only start 
after the last consistent point.


was (Author: shashikant):
The issue being caused here is as data is still to be replicated to the 
followers via leader, as a result of key delete , a block in a closed container 
can get deleted on the leader. When the follower asks for the chunk data from 
the leader, it fails as the chunk file does not exist in the leader.

The solution being proposed here is as follows:

Whenever a delete command gets received on a datanode from SCM, it should first 
check the min replicated index across all the servers in the pipeline. 
ContainerStateMachine will also track, the close container log index for each 
cotainer. Now, if the min replicated index >= close container index in the 
leader, a delete operation will be queued over Ratis in the leader and same 
will be ignored in the follower and now delete will happen over Ratis. In case, 
close container index is not replicated, delete transaction will never be 
enqueued over Ratis and ignored. SCM already has a retry policy in place to 
retry the same delete.

In case, the Ratis pipeline does not exist, delete will work as is.

> Datanode unable to find chunk while replication data using ratis.
> -----------------------------------------------------------------
>
>                 Key: HDDS-1753
>                 URL: https://issues.apache.org/jira/browse/HDDS-1753
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>          Components: Ozone Datanode
>    Affects Versions: 0.4.0
>            Reporter: Mukul Kumar Singh
>            Assignee: Shashikant Banerjee
>            Priority: Major
>              Labels: MiniOzoneChaosCluster
>
> Leader datanode is unable to read chunk from the datanode while replicating 
> data from leader to follower.
> Please note that deletion of keys is also happening while the data is being 
> replicated.
> {code}
> 2019-07-02 19:39:22,604 INFO  impl.RaftServerImpl 
> (RaftServerImpl.java:checkInconsistentAppendEntries(972)) - 
> 5ac88709-a3a2-4c8f-91de-5e54b617f05e: inconsistency entries. 
> Reply:76a3eb0f-d7cd-477b-8973-db1
> 014feb398<-5ac88709-a3a2-4c8f-91de-5e54b617f05e#70:FAIL,INCONSISTENCY,nextIndex:9771,term:2,followerCommit:9782
> 2019-07-02 19:39:22,605 ERROR impl.ChunkManagerImpl 
> (ChunkUtils.java:readData(161)) - Unable to find the chunk file. chunk info : 
> ChunkInfo{chunkName='76ec669ae2cb6e10dd9f08c0789c5fdf_stream_a2850dce-def3
> -4d64-93d8-fa2ebafee933_chunk_1, offset=0, len=2048}
> 2019-07-02 19:39:22,605 INFO  impl.RaftServerImpl 
> (RaftServerImpl.java:checkInconsistentAppendEntries(990)) - 
> 5ac88709-a3a2-4c8f-91de-5e54b617f05e: Failed appendEntries as latest snapshot 
> (9770) already h
> as the append entries (first index: 1)
> 2019-07-02 19:39:22,605 INFO  impl.RaftServerImpl 
> (RaftServerImpl.java:checkInconsistentAppendEntries(972)) - 
> 5ac88709-a3a2-4c8f-91de-5e54b617f05e: inconsistency entries. 
> Reply:76a3eb0f-d7cd-477b-8973-db1
> 014feb398<-5ac88709-a3a2-4c8f-91de-5e54b617f05e#71:FAIL,INCONSISTENCY,nextIndex:9771,term:2,followerCommit:9782
> 2019-07-02 19:39:22,605 INFO  keyvalue.KeyValueHandler 
> (ContainerUtils.java:logAndReturnError(146)) - Operation: ReadChunk : Trace 
> ID: 4216d461a4679e17:4216d461a4679e17:0:0 : Message: Unable to find the c
> hunk file. chunk info 
> ChunkInfo{chunkName='76ec669ae2cb6e10dd9f08c0789c5fdf_stream_a2850dce-def3-4d64-93d8-fa2ebafee933_chunk_1,
>  offset=0, len=2048} : Result: UNABLE_TO_FIND_CHUNK
> 2019-07-02 19:39:22,605 INFO  impl.RaftServerImpl 
> (RaftServerImpl.java:checkInconsistentAppendEntries(990)) - 
> 5ac88709-a3a2-4c8f-91de-5e54b617f05e: Failed appendEntries as latest snapshot 
> (9770) already h
> as the append entries (first index: 2)
> 2019-07-02 19:39:22,606 INFO  impl.RaftServerImpl 
> (RaftServerImpl.java:checkInconsistentAppendEntries(972)) - 
> 5ac88709-a3a2-4c8f-91de-5e54b617f05e: inconsistency entries. 
> Reply:76a3eb0f-d7cd-477b-8973-db1
> 014feb398<-5ac88709-a3a2-4c8f-91de-5e54b617f05e#72:FAIL,INCONSISTENCY,nextIndex:9771,term:2,followerCommit:9782
> 19:39:22.606 [pool-195-thread-19] ERROR DNAudit - user=null | ip=null | 
> op=READ_CHUNK {blockData=conID: 3 locID: 102372189549953034 bcsId: 0} | 
> ret=FAILURE
> java.lang.Exception: Unable to find the chunk file. chunk info 
> ChunkInfo{chunkName='76ec669ae2cb6e10dd9f08c0789c5fdf_stream_a2850dce-def3-4d64-93d8-fa2ebafee933_chunk_1,
>  offset=0, len=2048}
>         at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:320)
>  ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?]
>         at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:148)
>  ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?]
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:346)
>  ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?]
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.readStateMachineData(ContainerStateMachine.java:476)
>  ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?]
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$getCachedStateMachineData$2(ContainerStateMachine.java:495)
>  ~[hadoop-hdds-container-service-0.5.0-SN
> APSHOT.jar:?]
>         at 
> com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4767)
>  ~[guava-11.0.2.jar:?]
>         at 
> com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
>  ~[guava-11.0.2.jar:?]
>         at 
> com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350) 
> ~[guava-11.0.2.jar:?]
>         at 
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
>  ~[guava-11.0.2.jar:?]
>         at 
> com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228) 
> ~[guava-11.0.2.jar:?]
>         at com.google.common.cache.LocalCache.get(LocalCache.java:3965) 
> ~[guava-11.0.2.jar:?]
>         at 
> com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764) 
> ~[guava-11.0.2.jar:?]
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.getCachedStateMachineData(ContainerStateMachine.java:494)
>  ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.ja
> r:?]
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$readStateMachineData$4(ContainerStateMachine.java:542)
>  ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?]
>         at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>  [?:1.8.0_171]
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [?:1.8.0_171]
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [?:1.8.0_171]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to