[ 
https://issues.apache.org/jira/browse/HDDS-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Shao Hong reassigned HDDS-5916:
----------------------------------

    Assignee:     (was: Xu Shao Hong)

> DNs in pipeline raft group get stuck in infinite leader election in Kubernets 
> env
> ---------------------------------------------------------------------------------
>
>                 Key: HDDS-5916
>                 URL: https://issues.apache.org/jira/browse/HDDS-5916
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Xu Shao Hong
>            Priority: Critical
>              Labels: kubernetes, pull-request-available
>         Attachments: wecom-temp-096bc77af479d5e6c280bbcaa35b7fe5.png, 
> wecom-temp-56d8d0bcd030797a228dbb32e0dfa0f1.png, 
> wecom-temp-5c5afba22bfcf188415ad622f82f66af.png
>
>
> During the chaos test, 10% DNs were killed to mimic the possible accident. 
> Env:
> kubernetes+ PV+prom
>  
> Phenomenon:
> The key writing rate sharply reduces and was tended to be a horizontal line. 
> Even after the chaos injection was recovered, the rate kept still.
> In addition, the scm_pipeline_metrics_num_pipeline_allocated metrics showed 
> the periodic creation of new pipelines endlessly. 
> Datanodes were holding leader elections continuously, and cannot become 
> stable after the leader was elected.
>  
> Reason:
> The DN pods were killed once and the IP of each revived pod might not have 
> the same IP address as previous. SCM can receive heartbeats from them and 
> treat them as normal due to the invariance of DN UUID with PV. The SCM 
> currently does not update IP in the DatanodeDetails, thus it would transfer 
> wrong info for the datanodes in the newly allocated pipeline. 
> In the raft group,  for example,  three raft peers are  ABC respectively.  A 
> was revived and had a new IP address. A could contact BC, but BC could not 
> contact A. Thus A would never receive the heartbeats from leader B or C and 
> get stuck in the transition of follower and candidate.  Each time A become 
> the candidate, it will increase the term, raise the leader election and send 
> it successfully to BC. The leader once receives the requestVote, will step 
> down and reelect. This explains why the raft group in the pipeline never 
> stabilize.
> Meanwhile, the short-term leader could send the ready message to the SCM, and 
> the SCM misunderstands this pipeline is ready to write chunk, causing 
> blocking issues.
>  
> Possible solution:
> check the datanodeDetails either by  the SCM and update IP if necessary.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to