[ 
https://issues.apache.org/jira/browse/HDDS-4630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glen Geng updated HDDS-4630:
----------------------------
    Description: 
This dead lock is found when trying to replace the MockRatisServer with single 
server SCMRatisServer in MiniOzoneCluster.

It can be reproduced by case 
TestContainerStateMachineFlushDelay#testContainerStateMachineFailures, when 
replacing the mock ratis server with the real one.

 

*The root cause is*

when close a pipeline, it will first close the open containers of this 
pipeline, then remove the pipeline. The contention here is:
 # ContainerManager has committed the log entry that containing 
updateContainerState, and the StateMachineUpdater is applying this method, 
waiting for the lock of PipelineManagerV2Impl. Since when a container 
transitions from open to un-open, it needs to call 
PipelineManager#removeContainerFromPipeline, thus need the lock of 
PipelineManagerV2Impl.
 # In PipelineActionHander, it has acquired the lock of PipelineManagerV2Impl 
during the call of PipelineManagerV2Impl#removePipeline(), and it is waiting 
for StateManager#removePipeline to be committed by raft and applied by 
StateMachineUpdater.

thus, ContainerManager occupy StateMachineUpdater, and waiting for the lock of 
PipelineManager, PipelineActionHander acquire the lock of PipelineManager, and 
waiting for StateMachineUpdater to apply its raft client request.

 

*The solution is*

We have PipelineManager and PipelineStateManager, ContainerManager and 
ContainerStateManager, each has its own rw lock.

Let's discuss about PipelineManager and PipelineStateManager first.

PipelineStateManager contains the in-memory state and the rocksdb. It use a rw 
lock to ensure the consistency of the in-memory state and rocksdb. This is done 
in this PR: [https://github.com/apache/ozone/pull/1676]

The write request needs acquire the write lock before do modification, and the 
read request needs acquire the read lock before read. All the write request are 
from StateMachineUpdater, and the read requests are mainly from foreground 
request, which means all the updates are done from ratis.

For the non-HA code, the rw lock in PipelineManager is the only protection for 
thread-safety, there is no lock in PipelineStateManager. But for HA code, we 
have to rely on the rw lock in PipelineStateManager to ensure the thread-safety.

Thus, we can relax the lock in PipelineManager, just use it to ensure that 
there is at most one on-going ratis operation, since previous logic is acquire 
the write lock of PipelineManager and do raft client request, we just follow 
this logic.

 

*P.S.* 

The analysis is also applicable for ContainerManager and ContainerStateManager.

  was:
This dead lock is found when trying to replace the MockRatisServer with single 
server SCMRatisServer in MiniOzoneCluster.

It can be reproduced by case 
TestContainerStateMachineFlushDelay#testContainerStateMachineFailures, when 
replacing the mock ratis server with the real one.

 

*The root cause is*

when close a pipeline, it will first close the open containers of this 
pipeline, then remove the pipeline. ** The contention here is:
 # ContainerManager has committed the log entry that containing 
updateContainerState, and the StateMachineUpdater is applying this method, 
waiting for the lock of PipelineManagerV2Impl. Since when a container 
transitions from open to un-open, it needs to call 
PipelineManager#removeContainerFromPipeline, thus need the lock of 
PipelineManagerV2Impl.
 # In PipelineActionHander, it has acquired the lock of PipelineManagerV2Impl 
during the call of PipelineManagerV2Impl#removePipeline(), and it is waiting 
for StateManager#removePipeline to be committed by raft and applied by 
StateMachineUpdater.

thus, ContainerManager occupy StateMachineUpdater, and waiting for the lock of 
PipelineManager, PipelineActionHander acquire the lock of PipelineManager, and 
waiting for StateMachineUpdater to apply its raft client request.

 

*The solution is*

We have PipelineManager and PipelineStateManager, ContainerManager and 
ContainerStateManager, each has its own rw lock.

Let's discuss about PipelineManager and PipelineStateManager first.

PipelineStateManager contains the in-memory state and the rocksdb. It use a rw 
lock to ensure the consistency of the in-memory state and rocksdb. This is done 
in this PR: [https://github.com/apache/ozone/pull/1676]

The write request needs acquire the write lock before do modification, and the 
read request needs acquire the read lock before read. All the write request are 
from StateMachineUpdater, and the read requests are mainly from foreground 
request, which means all the updates are done from ratis.

For the non-HA code, the rw lock in PipelineManager is the only protection for 
thread-safety, there is no lock in PipelineStateManager. But for HA code, we 
have to rely on the rw lock in PipelineStateManager to ensure the thread-safety.

Thus, we can relax the lock in PipelineManager, just use it to ensure that 
there is at most one on-going ratis operation, since previous logic is acquire 
the write lock of PipelineManager and do raft client request, we just follow 
this logic.

 

*P.S.* 

The analysis is also applicable for ContainerManager and ContainerStateManager.


> Solve dead lock when PipelineActionHandler is triggered.
> --------------------------------------------------------
>
>                 Key: HDDS-4630
>                 URL: https://issues.apache.org/jira/browse/HDDS-4630
>             Project: Hadoop Distributed Data Store
>          Issue Type: Sub-task
>          Components: SCM HA
>            Reporter: Glen Geng
>            Assignee: Glen Geng
>            Priority: Major
>         Attachments: PipelineActionHander 1.png, PipelineActionHander 2.png, 
> StateMachineUpdater 1.png, StateMachineUpdater 2.png
>
>
> This dead lock is found when trying to replace the MockRatisServer with 
> single server SCMRatisServer in MiniOzoneCluster.
> It can be reproduced by case 
> TestContainerStateMachineFlushDelay#testContainerStateMachineFailures, when 
> replacing the mock ratis server with the real one.
>  
> *The root cause is*
> when close a pipeline, it will first close the open containers of this 
> pipeline, then remove the pipeline. The contention here is:
>  # ContainerManager has committed the log entry that containing 
> updateContainerState, and the StateMachineUpdater is applying this method, 
> waiting for the lock of PipelineManagerV2Impl. Since when a container 
> transitions from open to un-open, it needs to call 
> PipelineManager#removeContainerFromPipeline, thus need the lock of 
> PipelineManagerV2Impl.
>  # In PipelineActionHander, it has acquired the lock of PipelineManagerV2Impl 
> during the call of PipelineManagerV2Impl#removePipeline(), and it is waiting 
> for StateManager#removePipeline to be committed by raft and applied by 
> StateMachineUpdater.
> thus, ContainerManager occupy StateMachineUpdater, and waiting for the lock 
> of PipelineManager, PipelineActionHander acquire the lock of PipelineManager, 
> and waiting for StateMachineUpdater to apply its raft client request.
>  
> *The solution is*
> We have PipelineManager and PipelineStateManager, ContainerManager and 
> ContainerStateManager, each has its own rw lock.
> Let's discuss about PipelineManager and PipelineStateManager first.
> PipelineStateManager contains the in-memory state and the rocksdb. It use a 
> rw lock to ensure the consistency of the in-memory state and rocksdb. This is 
> done in this PR: [https://github.com/apache/ozone/pull/1676]
> The write request needs acquire the write lock before do modification, and 
> the read request needs acquire the read lock before read. All the write 
> request are from StateMachineUpdater, and the read requests are mainly from 
> foreground request, which means all the updates are done from ratis.
> For the non-HA code, the rw lock in PipelineManager is the only protection 
> for thread-safety, there is no lock in PipelineStateManager. But for HA code, 
> we have to rely on the rw lock in PipelineStateManager to ensure the 
> thread-safety.
> Thus, we can relax the lock in PipelineManager, just use it to ensure that 
> there is at most one on-going ratis operation, since previous logic is 
> acquire the write lock of PipelineManager and do raft client request, we just 
> follow this logic.
>  
> *P.S.* 
> The analysis is also applicable for ContainerManager and 
> ContainerStateManager.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to