[ 
https://issues.apache.org/jira/browse/HDDS-4630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang resolved HDDS-4630.
----------------------------
    Resolution: Fixed

> Solve deadlock triggered by PipelineActionHandler.
> --------------------------------------------------
>
>                 Key: HDDS-4630
>                 URL: https://issues.apache.org/jira/browse/HDDS-4630
>             Project: Hadoop Distributed Data Store
>          Issue Type: Sub-task
>          Components: SCM HA
>            Reporter: Glen Geng
>            Assignee: Glen Geng
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: PipelineActionHander 1.png, PipelineActionHander 2.png, 
> StateMachineUpdater 1.png, StateMachineUpdater 2.png
>
>
> This dead lock is found when trying to replace the MockRatisServer with 
> single server SCMRatisServer in MiniOzoneCluster.
> It can be reproduced by case 
> TestContainerStateMachineFlushDelay#testContainerStateMachineFailures, when 
> replacing the mock ratis server with the real one.
>  
> *The root cause is*
> when close a pipeline, it will first close the open containers of this 
> pipeline, then remove the pipeline. The contention here is:
>  # ContainerManager has committed the log entry that containing 
> updateContainerState, and the StateMachineUpdater is applying this method, 
> waiting for the lock of PipelineManagerV2Impl. Since when a container 
> transitions from open to un-open, it needs to call 
> PipelineManager#removeContainerFromPipeline, thus need the lock of 
> PipelineManagerV2Impl.
>  # In PipelineActionHander, it has acquired the lock of PipelineManagerV2Impl 
> during the call of PipelineManagerV2Impl#removePipeline(), and it is waiting 
> for StateManager#removePipeline() to be committed by raft and applied by 
> StateMachineUpdater.
> thus, ContainerManager occupy StateMachineUpdater, and waiting for the lock 
> of PipelineManager, PipelineActionHander acquire the lock of PipelineManager, 
> and waiting for StateMachineUpdater to apply its raft client request.
>  
> *The solution is*
> We have PipelineManager and PipelineStateManager, ContainerManager and 
> ContainerStateManager, each has its own rw lock.
> Let's discuss about PipelineManager and PipelineStateManager first.
>  
> PipelineStateManager contains the in-memory state and the rocksdb. It use a 
> rw lock to ensure the consistency of the in-memory state and rocksdb. This is 
> done in this PR: [https://github.com/apache/ozone/pull/1676]
> The write request needs acquire the write lock before do modification, and 
> the read request needs acquire the read lock before read. All the write 
> request are from StateMachineUpdater, and the read requests are mainly from 
> foreground request, which means all the modifications are done from ratis.
> For the non-HA code, the rw lock in PipelineManager is the only protection 
> for thread-safety, there is no lock in PipelineStateManager. But for HA code, 
> we have to rely on the rw lock in PipelineStateManager to ensure the 
> thread-safety.
>  
> Since currently most of the lock operations in PipelineManager and 
> PipelineStateManager are duplicated, we can relax the lock in 
> PipelineManager, just use it to ensure that there is at most one on-going 
> ratis operation. Previous logic is acquiring the write lock of 
> PipelineManager and doing raft client request, ratis client requests are 
> serialized, we just follow this logic.
>  
> We have a small drawback, the read request handled by PipelineStateManager 
> may not be able to see the latest update, since there might be one in-flight 
> action. We accept it since:
>  # it won't cause any safety problem.
>  # current code has the issue as well.
>  # This should be our future direction of performance optimization: allow 
> batch and parallel raft client request.
>  
> *P.S.* 
> The analysis is also applicable for ContainerManager and 
> ContainerStateManager.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to