[
https://issues.apache.org/jira/browse/HDDS-10945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sumit Agrawal updated HDDS-10945:
---------------------------------
Description:
one of the OM is continuously in an election state when 4 om is configured, as
ratis quorum needs odd number of nodes, but 4 nodes are configurd.
{code:java}
ozone-om.sl0548.log (OM2 Leader):
[..]
2023-11-08 16:28:40,670 INFO
[grpc-default-executor-29]-org.apache.ratis.server.RaftServer$Division:
om2@group-278936BBF583: receive requestVote(PRE_VOTE, om3, group-278936BBF583,
2248468, (t:2248467, i:450468))
2023-11-08 16:28:40,670 INFO
[grpc-default-executor-29]-org.apache.ratis.server.impl.VoteContext:
om2@group-278936BBF583-LEADER: reject PRE_VOTE from om3: this server is the
leader and still has leadership
2023-11-08 16:28:40,670 INFO
[grpc-default-executor-29]-org.apache.ratis.server.RaftServer$Division:
om2@group-278936BBF583 replies to PRE_VOTE vote request:
om3<-om2#0:FAIL-t2248468. Peer's state: om2@group-278936BBF583:t2248468,
leader=om2, voted=om2,
raftlog=Memoized:om2@group-278936BBF583-SegmentedRaftLog:OPENED:c464230,
conf=450469:
peers:[om1|rpc:sl0547.sii24.pole-emploi.intra:9872|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER,
om3|rpc:sl0549.sii24.pole-emploi.intra:9872|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER,
om2|rpc:sl0548.sii24.pole-emploi.intra:9872|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER]|listeners:[],
old=null
[..] {code}
OMs state:
sl0546:~> sudo ozone admin om getserviceroles -id=ozone01
om1 : FOLLOWER (sl0546.sii24.pole-emploi.intra)
om3 : FOLLOWER (sl0549.sii24.pole-emploi.intra)
om2 : LEADER (sl0548.sii24.pole-emploi.intra)
Observed:
It seems like some configuration issue: There are at least four hostnames
sl0546, sl0547, sl0548 and sl0549 for three OMs.
So validation needs to be added that count of nodes should be in odd number.
was:
one of the OM is continuously in an election state when 4 om is configured, as
ratis quorum needs odd number of nodes, but 4 nodes are configurd.
{code:java}
ozone-om.sl0548.log (OM2 Leader):
[..]
2023-11-08 16:28:40,670 INFO
[grpc-default-executor-29]-org.apache.ratis.server.RaftServer$Division:
om2@group-278936BBF583: receive requestVote(PRE_VOTE, om3, group-278936BBF583,
2248468, (t:2248467, i:450468))
2023-11-08 16:28:40,670 INFO
[grpc-default-executor-29]-org.apache.ratis.server.impl.VoteContext:
om2@group-278936BBF583-LEADER: reject PRE_VOTE from om3: this server is the
leader and still has leadership
2023-11-08 16:28:40,670 INFO
[grpc-default-executor-29]-org.apache.ratis.server.RaftServer$Division:
om2@group-278936BBF583 replies to PRE_VOTE vote request:
om3<-om2#0:FAIL-t2248468. Peer's state: om2@group-278936BBF583:t2248468,
leader=om2, voted=om2,
raftlog=Memoized:om2@group-278936BBF583-SegmentedRaftLog:OPENED:c464230,
conf=450469:
peers:[om1|rpc:sl0547.sii24.pole-emploi.intra:9872|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER,
om3|rpc:sl0549.sii24.pole-emploi.intra:9872|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER,
om2|rpc:sl0548.sii24.pole-emploi.intra:9872|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER]|listeners:[],
old=null
[..] {code}
OMs state:
sl0546:~> sudo ozone admin om getserviceroles -id=ozone01
om1 : FOLLOWER (sl0546.sii24.pole-emploi.intra)
om3 : FOLLOWER (sl0549.sii24.pole-emploi.intra)
om2 : LEADER (sl0548.sii24.pole-emploi.intra)
> validate mis-configuration for peer node count in OM
> ----------------------------------------------------
>
> Key: HDDS-10945
> URL: https://issues.apache.org/jira/browse/HDDS-10945
> Project: Apache Ozone
> Issue Type: Improvement
> Components: Ozone Manager
> Reporter: Sumit Agrawal
> Assignee: Sumit Agrawal
> Priority: Minor
>
> one of the OM is continuously in an election state when 4 om is configured,
> as ratis quorum needs odd number of nodes, but 4 nodes are configurd.
>
> {code:java}
> ozone-om.sl0548.log (OM2 Leader):
> [..]
> 2023-11-08 16:28:40,670 INFO
> [grpc-default-executor-29]-org.apache.ratis.server.RaftServer$Division:
> om2@group-278936BBF583: receive requestVote(PRE_VOTE, om3,
> group-278936BBF583, 2248468, (t:2248467, i:450468))
> 2023-11-08 16:28:40,670 INFO
> [grpc-default-executor-29]-org.apache.ratis.server.impl.VoteContext:
> om2@group-278936BBF583-LEADER: reject PRE_VOTE from om3: this server is the
> leader and still has leadership
> 2023-11-08 16:28:40,670 INFO
> [grpc-default-executor-29]-org.apache.ratis.server.RaftServer$Division:
> om2@group-278936BBF583 replies to PRE_VOTE vote request:
> om3<-om2#0:FAIL-t2248468. Peer's state: om2@group-278936BBF583:t2248468,
> leader=om2, voted=om2,
> raftlog=Memoized:om2@group-278936BBF583-SegmentedRaftLog:OPENED:c464230,
> conf=450469:
> peers:[om1|rpc:sl0547.sii24.pole-emploi.intra:9872|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER,
>
> om3|rpc:sl0549.sii24.pole-emploi.intra:9872|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER,
>
> om2|rpc:sl0548.sii24.pole-emploi.intra:9872|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER]|listeners:[],
> old=null
> [..] {code}
>
> OMs state:
> sl0546:~> sudo ozone admin om getserviceroles -id=ozone01
> om1 : FOLLOWER (sl0546.sii24.pole-emploi.intra)
> om3 : FOLLOWER (sl0549.sii24.pole-emploi.intra)
> om2 : LEADER (sl0548.sii24.pole-emploi.intra)
>
> Observed:
> It seems like some configuration issue: There are at least four hostnames
> sl0546, sl0547, sl0548 and sl0549 for three OMs.
> So validation needs to be added that count of nodes should be in odd number.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]