[
https://issues.apache.org/jira/browse/HDDS-4630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Glen Geng updated HDDS-4630:
----------------------------
Description:
This dead lock is found when trying to replace the MockRatisServer with single
server SCMRatisServer in MiniOzoneCluster.
It can be reproduced by case
TestContainerStateMachineFlushDelay#testContainerStateMachineFailures, when
replacing the mock ratis server with the real one.
The root cause is, when close a pipeline, it will first close the open
containers of this pipeline, then remove the pipeline.
The contention here is:
# ContainerManager has committed the log entry that containing
updateContainerState, and the StateMachineUpdater is applying this method,
waiting for the lock of PipelineManagerV2Impl. Since when a container
transitions from open to un-open, it needs to call
PipelineManager#removeContainerFromPipeline, thus need the lock of
PipelineManagerV2Impl.
#
and is applying
was:
This dead lock is found when trying to replace the MockRatisServer with single
server SCMRatisServer in MiniOzoneCluster.
It can be reproduced by case
TestContainerStateMachineFlushDelay#testContainerStateMachineFailures, by
replacing the mock ratis server with the real one.
> Solve dead lock when PipelineActionHandler is triggered.
> --------------------------------------------------------
>
> Key: HDDS-4630
> URL: https://issues.apache.org/jira/browse/HDDS-4630
> Project: Hadoop Distributed Data Store
> Issue Type: Sub-task
> Components: SCM HA
> Reporter: Glen Geng
> Assignee: Glen Geng
> Priority: Major
> Attachments: PipelineActionHander 1.png, PipelineActionHander 2.png,
> StateMachineUpdater 1.png, StateMachineUpdater 2.png
>
>
> This dead lock is found when trying to replace the MockRatisServer with
> single server SCMRatisServer in MiniOzoneCluster.
> It can be reproduced by case
> TestContainerStateMachineFlushDelay#testContainerStateMachineFailures, when
> replacing the mock ratis server with the real one.
> The root cause is, when close a pipeline, it will first close the open
> containers of this pipeline, then remove the pipeline.
> The contention here is:
> # ContainerManager has committed the log entry that containing
> updateContainerState, and the StateMachineUpdater is applying this method,
> waiting for the lock of PipelineManagerV2Impl. Since when a container
> transitions from open to un-open, it needs to call
> PipelineManager#removeContainerFromPipeline, thus need the lock of
> PipelineManagerV2Impl.
> #
> and is applying
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]