[
https://issues.apache.org/jira/browse/HDFS-9787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15140175#comment-15140175
]
Jesse Yates commented on HDFS-9787:
-----------------------------------
Taking a quick look, this would imply that the non-primary SNN never sends a
checkpoint after the first time? A good test to ensure that this is the case is
to start the NNs, wait until there primary SNN is selected and then remove it
from the cluster. Are any more checkpoints sent to the ANN?
My inclination is that you are correct, no (unless it takes a long time to
build the checkpoint), but I'd like to hear if that's actually the case. I
think the fix is to just set lastCheckpointTime in doCheckpoint() rather than
after each loop iteration.
> SNNs stop uploading FSImage to ANN once isPrimaryCheckPointer changed to
> false.
> -------------------------------------------------------------------------------
>
> Key: HDFS-9787
> URL: https://issues.apache.org/jira/browse/HDFS-9787
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: ha
> Affects Versions: 3.0.0
> Reporter: Guocui Mi
> Assignee: Guocui Mi
> Attachments: HDFS-9786-v000.patch
>
>
> SNNs stop uploading FSImage to ANN once isPrimaryCheckPointer become false.
> Here is the logic to check if upload FSImage or not.
> In StandbyCheckpointer.java
> boolean sendRequest = isPrimaryCheckPointer || secsSinceLast >=
> checkpointConf.getQuietPeriod();
> doCheckpoint(sendRequest);
> The sendRequest is always false if isPrimaryCheckPointer is false giving
> secsSinceLast (~checkpointPeriod) >= checkpointConf.getQuietPeriod()
> (checkpointPeriod * this.quietMultiplier(default value 1.5)) always returns
> false.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)