[
https://issues.apache.org/jira/browse/HDDS-1454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824203#comment-16824203
]
Anu Engineer commented on HDDS-1454:
------------------------------------
One of the hard learned lesson, in my previous job was systems like SCM should
not make massive changes. Say if we are closing down more than 30% of all
pipelines. Emit a warning wait for human intervention, or slow down to a degree
that close is controlled. In fact, we should have a pipeline/container close
rate, that is we will not do more than x amount per unit time. It is also a
good to have a big red button, so that if the system gets into this state,
admin has the ability to stop this activity by SCM.
> GC other system pause events can trigger pipeline destroy for all the nodes
> in the cluster
> ------------------------------------------------------------------------------------------
>
> Key: HDDS-1454
> URL: https://issues.apache.org/jira/browse/HDDS-1454
> Project: Hadoop Distributed Data Store
> Issue Type: Bug
> Components: SCM
> Affects Versions: 0.3.0
> Reporter: Mukul Kumar Singh
> Priority: Major
> Labels: MiniOzoneChaosCluster
>
> In a MiniOzoneChaosCluster run it was observed that events like GC pauses or
> any other pauses in SCM can mark all the datanodes as stale in SCM. This will
> trigger multiple pipeline destroy and will render the system unusable.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]