[
https://issues.apache.org/jira/browse/HDDS-4630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rui Wang resolved HDDS-4630.
----------------------------
Resolution: Fixed
> Solve deadlock triggered by PipelineActionHandler.
> --------------------------------------------------
>
> Key: HDDS-4630
> URL: https://issues.apache.org/jira/browse/HDDS-4630
> Project: Hadoop Distributed Data Store
> Issue Type: Sub-task
> Components: SCM HA
> Reporter: Glen Geng
> Assignee: Glen Geng
> Priority: Major
> Labels: pull-request-available
> Attachments: PipelineActionHander 1.png, PipelineActionHander 2.png,
> StateMachineUpdater 1.png, StateMachineUpdater 2.png
>
>
> This dead lock is found when trying to replace the MockRatisServer with
> single server SCMRatisServer in MiniOzoneCluster.
> It can be reproduced by case
> TestContainerStateMachineFlushDelay#testContainerStateMachineFailures, when
> replacing the mock ratis server with the real one.
>
> *The root cause is*
> when close a pipeline, it will first close the open containers of this
> pipeline, then remove the pipeline. The contention here is:
> # ContainerManager has committed the log entry that containing
> updateContainerState, and the StateMachineUpdater is applying this method,
> waiting for the lock of PipelineManagerV2Impl. Since when a container
> transitions from open to un-open, it needs to call
> PipelineManager#removeContainerFromPipeline, thus need the lock of
> PipelineManagerV2Impl.
> # In PipelineActionHander, it has acquired the lock of PipelineManagerV2Impl
> during the call of PipelineManagerV2Impl#removePipeline(), and it is waiting
> for StateManager#removePipeline() to be committed by raft and applied by
> StateMachineUpdater.
> thus, ContainerManager occupy StateMachineUpdater, and waiting for the lock
> of PipelineManager, PipelineActionHander acquire the lock of PipelineManager,
> and waiting for StateMachineUpdater to apply its raft client request.
>
> *The solution is*
> We have PipelineManager and PipelineStateManager, ContainerManager and
> ContainerStateManager, each has its own rw lock.
> Let's discuss about PipelineManager and PipelineStateManager first.
>
> PipelineStateManager contains the in-memory state and the rocksdb. It use a
> rw lock to ensure the consistency of the in-memory state and rocksdb. This is
> done in this PR: [https://github.com/apache/ozone/pull/1676]
> The write request needs acquire the write lock before do modification, and
> the read request needs acquire the read lock before read. All the write
> request are from StateMachineUpdater, and the read requests are mainly from
> foreground request, which means all the modifications are done from ratis.
> For the non-HA code, the rw lock in PipelineManager is the only protection
> for thread-safety, there is no lock in PipelineStateManager. But for HA code,
> we have to rely on the rw lock in PipelineStateManager to ensure the
> thread-safety.
>
> Since currently most of the lock operations in PipelineManager and
> PipelineStateManager are duplicated, we can relax the lock in
> PipelineManager, just use it to ensure that there is at most one on-going
> ratis operation. Previous logic is acquiring the write lock of
> PipelineManager and doing raft client request, ratis client requests are
> serialized, we just follow this logic.
>
> We have a small drawback, the read request handled by PipelineStateManager
> may not be able to see the latest update, since there might be one in-flight
> action. We accept it since:
> # it won't cause any safety problem.
> # current code has the issue as well.
> # This should be our future direction of performance optimization: allow
> batch and parallel raft client request.
>
> *P.S.*
> The analysis is also applicable for ContainerManager and
> ContainerStateManager.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]