[ 
https://issues.apache.org/jira/browse/HDFS-9787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15140367#comment-15140367
 ] 

Guocui Mi commented on HDFS-9787:
---------------------------------

>>> this would imply that the non-primary SNN never sends a checkpoint after 
>>> the first time?
It is true according to my observation.
I am trying to add unittest to cover the scenario. Another two scenarios 
triggered in our cluster:
1) PrimaryCheckpoint uploading fsimage failure due to ANN not available 
temporarily.
2) Restart all NNs at same time.

I afraid the proposal you shared can't work.
1) set lastCheckpointTime before following code in doCheckpoint(): no 
difference between putting after each loop iteration.
2) after following code in doCheckpoint() :  Non-primary SNN will do checkpoint 
one by one continuously since lastCheckpointTime not get updated.
if(!sendCheckpoint){      return;    }

> SNNs stop uploading FSImage to ANN once isPrimaryCheckPointer changed to 
> false.
> -------------------------------------------------------------------------------
>
>                 Key: HDFS-9787
>                 URL: https://issues.apache.org/jira/browse/HDFS-9787
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha
>    Affects Versions: 3.0.0
>            Reporter: Guocui Mi
>            Assignee: Guocui Mi
>         Attachments: HDFS-9786-v000.patch
>
>
> SNNs stop uploading FSImage to ANN once isPrimaryCheckPointer become false. 
> Here is the logic to check if upload FSImage or not.
> In StandbyCheckpointer.java
> boolean sendRequest = isPrimaryCheckPointer || secsSinceLast >= 
> checkpointConf.getQuietPeriod();
>             doCheckpoint(sendRequest);
> The sendRequest is always false if isPrimaryCheckPointer is false giving 
> secsSinceLast (~checkpointPeriod) >= checkpointConf.getQuietPeriod() 
> (checkpointPeriod * this.quietMultiplier(default value 1.5)) always returns 
> false.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to