[
https://issues.apache.org/jira/browse/HDFS-14361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793261#comment-16793261
]
star commented on HDFS-14361:
-----------------------------
Let's go back to original design doc [^Multiple-Standby-NameNodes_V1.pdf]
SNN will get a HttpServletResponse.SC_CONFLICT status when ANN already download
image file from other SNN.
So it is not a serious issue if all SNN send checkpoint request to ANN.
Further more, SNN will send download image request as shown in comments
above line 429 in 3 cases:
* {color:#808080}rollback request{color}
* {color:#808080}are the checkpointer{color}
* {color:#808080} are outside the quiet period{color}
{color:#808080}{color:#333333}But from the patch only in later two case will
SNN send download request. I think it causes issue{color} HDFS-12248.{color}
{code:java}
if (needCheckpoint) {
// on all nodes, we build the checkpoint. However, we only ship the
checkpoint if have a
// rollback request, are the checkpointer, are outside the quiet period.
final long secsSinceLastUpload = (now - lastUploadTime) / 1000;
boolean sendRequest = isPrimaryCheckPointer
|| secsSinceLastUpload >= checkpointConf.getQuietPeriod();
doCheckpoint(sendRequest);
...
}{code}
I agree to to move isPrimaryCheckPointer outside of 'if' block to avoid a
inconsistent state that there are more than 1 SNN with isPrimaryCheckPointer =
true, though it will not break anything.
As to {color:#808080}HDFS-12248,{color:#333333} I think{color}{color} we may
change sendRequest as following:
{code:java}
boolean sendRequest = needRollbackCheckpoint || isPrimaryCheckPointer
|| secsSinceLastUpload >= checkpointConf.getQuietPeriod();
{code}
Thus all SNN will send request everytime rollbackCheckpoint is triggered. Or we
should fix the comments.
> SNN will always upload fsimage
> ------------------------------
>
> Key: HDFS-14361
> URL: https://issues.apache.org/jira/browse/HDFS-14361
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: ha, namenode
> Affects Versions: 3.2.0
> Reporter: hunshenshi
> Priority: Major
> Fix For: 3.2.0
>
>
> Related to -HDFS-12248.-
> {code:java}
> boolean sendRequest = isPrimaryCheckPointer
> || secsSinceLastUpload >= checkpointConf.getQuietPeriod();
> doCheckpoint(sendRequest);
> {code}
> If sendRequest is true, SNN will upload fsimage. But isPrimaryCheckPointer
> always is true,
> {code:java}
> if (ie == null && ioe == null) {
> //Update only when response from remote about success or
> lastUploadTime = monotonicNow();
> // we are primary if we successfully updated the ANN
> this.isPrimaryCheckPointer = success;
> }
> {code}
> isPrimaryCheckPointer should be outside the if condition.
> If the ANN update was not successful, then isPrimaryCheckPointer should be
> set to false.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]