[ 
https://issues.apache.org/jira/browse/HDDS-9083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duong updated HDDS-9083:
------------------------
    Description: 
In an existing cluster, for the first time upgrading to the Ozone version that 
uses Secret Keys, SCM and Datanodes seems to fall in a cyclic dependency during 
initialization.
 # SCM needs to receive (heathy) heartbeats from its known datanodes to escape 
safe mode. For now, SCM only initializes secret keys on an SCM leader ready, 
and it looks like that doesn't happen until SCM escapes safe mode.
 # During startup, Datanodes need to get the current active secret keys. For 
now, this call (with a long chain of retries) is made before datanodes schedule 
heartbeat reporting.

  was:
In an existing cluster, for the first time upgrading to the Ozone version that 
uses Secret Keys, SCM and Datanodes seems to fall in a cyclic dependency during 
initialization.
 # SCM needs to receive heartbeats from its known datanodes to escape safe 
mode. For now, SCM only initializes secret keys on an SCM leader ready. And it 
looks like this doesn't happen until SCM escape safe mode.
 # During startup, Datanodes need to get the current active secret keys. For 
now, this call (with a long chain of retries) is made before datanodes schedule 
heartbeat reporting.


> Deadlock between SCM & Datanodes regarding secret keys initializing
> -------------------------------------------------------------------
>
>                 Key: HDDS-9083
>                 URL: https://issues.apache.org/jira/browse/HDDS-9083
>             Project: Apache Ozone
>          Issue Type: Improvement
>            Reporter: Duong
>            Priority: Major
>
> In an existing cluster, for the first time upgrading to the Ozone version 
> that uses Secret Keys, SCM and Datanodes seems to fall in a cyclic dependency 
> during initialization.
>  # SCM needs to receive (heathy) heartbeats from its known datanodes to 
> escape safe mode. For now, SCM only initializes secret keys on an SCM leader 
> ready, and it looks like that doesn't happen until SCM escapes safe mode.
>  # During startup, Datanodes need to get the current active secret keys. For 
> now, this call (with a long chain of retries) is made before datanodes 
> schedule heartbeat reporting.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to