Kirill Sizov created HDDS-7706:
----------------------------------
Summary: OM Bootstrap is unable to add a node to a raft ring
Key: HDDS-7706
URL: https://issues.apache.org/jira/browse/HDDS-7706
Project: Apache Ozone
Issue Type: Bug
Components: OM HA
Affects Versions: 1.3.0
Reporter: Kirill Sizov
During the lifetime of OM HA the nodes might receive peer list update messages.
It usually happens when a node goes down and the up. But as long as all the
nodes of the cluster are present in this list - everything's fine.
However if we bootstrap a new node, in the following example it is om4, and it
replays the raft log on its side, such message will be fatal and cause it to
exit.
{noformat}
2022-12-22 19:45:17,386 [Thread[Thread-21,5,main]] INFO
security.OzoneDelegationTokenSecretManager: Starting expired delegation token
remover thread, tokenRemoverScanInterval=60 min(s)
2022-12-22 19:45:17,386 [Listener at om4address/9862] INFO om.OzoneManager:
Version File has different layout version (3) than OM DB (null). That is
expected if this OM has never been finalized to a newer layout version.
2022-12-22 19:45:17,387 [om4@group-942F8267F22A-StateMachineUpdater] INFO
ratis.OzoneManagerStateMachine: Received Configuration change notification from
Ratis. New Peer list:
[id: "om1"
address: "om1address:9872"
, id: "om3"
address: "om3address:9872"
, id: "om2"
address: "om2address:9872"
]
2022-12-22 19:45:17,387 [om4@group-942F8267F22A-StateMachineUpdater] ERROR
om.OzoneManager: Fatal Error: Shutting down as OM has been decommissioned.
2022-12-22 19:45:17,388 [om4@group-942F8267F22A-StateMachineUpdater] ERROR
om.OzoneManager: Terminating with exit status 1: Shutting down as OM has been
decommissioned.
{noformat}
*Expected behavior:*
the new node should be able to correctly replay the raft log.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]