[ 
https://issues.apache.org/jira/browse/HDDS-1454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824203#comment-16824203
 ] 

Anu Engineer commented on HDDS-1454:
------------------------------------

One of the hard learned lesson, in my previous job was systems like SCM should 
not make massive changes. Say if we are closing down more than 30% of all 
pipelines. Emit a warning wait for human intervention, or slow down to a degree 
that close is controlled. In fact, we should have a pipeline/container close 
rate, that is we will not do more than x amount per unit time. It is also a 
good to have a big red button, so that if the system gets into this state, 
admin has the ability to stop this activity by SCM.

> GC other system pause events can trigger pipeline destroy for all the nodes 
> in the cluster
> ------------------------------------------------------------------------------------------
>
>                 Key: HDDS-1454
>                 URL: https://issues.apache.org/jira/browse/HDDS-1454
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>          Components: SCM
>    Affects Versions: 0.3.0
>            Reporter: Mukul Kumar Singh
>            Priority: Major
>              Labels: MiniOzoneChaosCluster
>
> In a MiniOzoneChaosCluster run it was observed that events like GC pauses or 
> any other pauses in SCM can mark all the datanodes as stale in SCM. This will 
> trigger multiple pipeline destroy and will render the system unusable. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to