[jira] [Updated] (HDDS-5916) DNs in pipeline raft group get stuck in infinite leader election in Kubernets env

Xu Shao Hong (Jira) Mon, 01 Nov 2021 23:17:05 -0700


     [ 
https://issues.apache.org/jira/browse/HDDS-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Xu Shao Hong updated HDDS-5916:
-------------------------------
    Description: 
During the chaos test, 10% DNs were killed to mimic the possible accident. 

Env:

kubernetes+ PV+prom

 

Phenomenon:

The key writing rate sharply reduces and was tended to be a horizontal line. 

Even after the chaos injection was recovered, the rate kept still.

In addition, the scm_pipeline_metrics_num_pipeline_allocated metrics showed the 
periodic creation of new pipelines endlessly. 

Datanodes were holding leader elections continuously, and cannot become stable 
after the leader was elected.

 

Reason:

The DN pods were killed once and the IP of each revived pod might not have the 
same IP address as previous. SCM can receive heartbeats from them and treat 
them as normal due to the invariance of DN UUID with PV. The SCM currently does 
not update IP in the DatanodeDetails, thus it would transfer wrong info for the 
datanodes in the newly allocated pipeline. 

In the raft group,  for example,  three raft peers are  ABC respectively.  A 
was revived and had a new IP address. A could contact BC, but BC could not 
contact A. Thus A would never receive the heartbeats from leader B or C and get 
stuck in the transition of follower and candidate.  Each time A become the 
candidate, it will increase the term, raise the leader election and send it 
successfully to BC. The leader once receives the requestVote, will step down 
and reelect. This explains why the raft group in the pipeline never stabilize.

Meanwhile, the short-term leader could send the ready message to the SCM, and 
the SCM misunderstands this pipeline is ready to write chunk, causing blocking 
issues.

 

Possible solution:

check the datanodeDetails either by DN itself or the SCM and update IP if 
necessary.

 

  was:
During the chaos test, 10% DNs were killed to mimic the possible accident. 

Env:

kubernetes+ PV+prom

 

Phenomenon:

The key writing rate sharply reduces and was inclined to be a horizontal line. 

Even after the chaos injection was recovered, the rate kept still.

In addition, the scm_pipeline_metrics_num_pipeline_allocated metrics showed the 
periodic creation of new pipelines endlessly. 

Datanodes were holding leader elections continuously, and cannot become stable 
after the leader was elected.

 

Reason:

The DN pods were killed once and the IP of each revived pod might not have the 
same IP address as previous. SCM can receive heartbeats from them and treat 
them as normal due to the invariance of DN UUID with PV. The SCM currently does 
not update IP in the DatanodeDetails, thus it would transfer wrong info for the 
datanodes in the newly allocated pipeline. 

In the raft group,  for example,  three raft peers are  ABC respectively.  A 
was revived and had a new IP address. A could contact BC, but BC could not 
contact A. Thus A would never receive the heartbeats from leader B or C and get 
stuck in the transition of follower and candidate.  Each time A become the 
candidate, it will increase the term, raise the leader election and send it 
successfully to BC. The leader once receives the requestVote, will step down 
and reelect. This explains why the raft group in the pipeline never stabilize.

Meanwhile, the short-term leader could send the ready message to the SCM, and 
the SCM misunderstands this pipeline is ready to write chunk, causing blocking 
issues.

 

Possible solution:

check the datanodeDetails either by DN itself or the SCM and update IP if 
necessary.

 


> DNs in pipeline raft group get stuck in infinite leader election in Kubernets 
> env
> ---------------------------------------------------------------------------------
>
>                 Key: HDDS-5916
>                 URL: https://issues.apache.org/jira/browse/HDDS-5916
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Xu Shao Hong
>            Priority: Critical
>         Attachments: wecom-temp-096bc77af479d5e6c280bbcaa35b7fe5.png, 
> wecom-temp-56d8d0bcd030797a228dbb32e0dfa0f1.png, 
> wecom-temp-5c5afba22bfcf188415ad622f82f66af.png
>
>
> During the chaos test, 10% DNs were killed to mimic the possible accident. 
> Env:
> kubernetes+ PV+prom
>  
> Phenomenon:
> The key writing rate sharply reduces and was tended to be a horizontal line. 
> Even after the chaos injection was recovered, the rate kept still.
> In addition, the scm_pipeline_metrics_num_pipeline_allocated metrics showed 
> the periodic creation of new pipelines endlessly. 
> Datanodes were holding leader elections continuously, and cannot become 
> stable after the leader was elected.
>  
> Reason:
> The DN pods were killed once and the IP of each revived pod might not have 
> the same IP address as previous. SCM can receive heartbeats from them and 
> treat them as normal due to the invariance of DN UUID with PV. The SCM 
> currently does not update IP in the DatanodeDetails, thus it would transfer 
> wrong info for the datanodes in the newly allocated pipeline. 
> In the raft group,  for example,  three raft peers are  ABC respectively.  A 
> was revived and had a new IP address. A could contact BC, but BC could not 
> contact A. Thus A would never receive the heartbeats from leader B or C and 
> get stuck in the transition of follower and candidate.  Each time A become 
> the candidate, it will increase the term, raise the leader election and send 
> it successfully to BC. The leader once receives the requestVote, will step 
> down and reelect. This explains why the raft group in the pipeline never 
> stabilize.
> Meanwhile, the short-term leader could send the ready message to the SCM, and 
> the SCM misunderstands this pipeline is ready to write chunk, causing 
> blocking issues.
>  
> Possible solution:
> check the datanodeDetails either by DN itself or the SCM and update IP if 
> necessary.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDDS-5916) DNs in pipeline raft group get stuck in infinite leader election in Kubernets env

Reply via email to