[ 
https://issues.apache.org/jira/browse/HDDS-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17306806#comment-17306806
 ] 

Glen Geng commented on HDDS-5015:
---------------------------------

*The root cause here is*:

when localId is not set in the sequenceId table, SCM will initialize it to be 
UniqueId.next(). When setup 3 SCM from scratch, each of them will individually 
set their localId to be their own UniqueId.next(). The sequenceId is diverged 
from the very beginning.

*Short term solution is:*

make the 3 SCM has an agreement about the localId.

*Long tem solutos is:*

There will be a short term solution, and the long-term solution will be  
HDDS-5016.  During bootstrap, always download checkpoint from leader SCM, and 
replace their own scm.db with that of leader.

 

*The short term solution is safe:*

upgrade in-memory scm to bypass-ratis scm: not affected.

upgrade in-memory scm to single-node scm: not affected.

upgrade in-memory scm to three-node scm cluster: not support yet.

setup a bypass-ratis scm: not affected.

setup a three-node scm cluster from scratch: fix by the short term solution.

 

> SequenceID is not consistent when setup a multi node SCM HA cluster.
> --------------------------------------------------------------------
>
>                 Key: HDDS-5015
>                 URL: https://issues.apache.org/jira/browse/HDDS-5015
>             Project: Apache Ozone
>          Issue Type: Sub-task
>          Components: SCM HA
>            Reporter: Xu Shao Hong
>            Assignee: Glen Geng
>            Priority: Major
>
> We set up the three node SCM HA cluster for test purpose.
> From ozone dbug ldb tool, we found that the sequenceIDs are not same between 
> the three SCM. The reason is due to localID, which is initialized based on 
> each machines own timestamp. 
> The ldb result fetch from scm.db on 3 SCMs. 
> *scm1*
> 17000 END 
>  8000 END 
>  105898712280731336 END
> *scm2*
> 17000 END
>  8000 END
>  105898723592162080 END
> *scm3*
> 17000 END
>  8000 END
>  105898724336720504 END



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to