[ https://issues.apache.org/jira/browse/HDFS-14361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793261#comment-16793261 ]
star commented on HDFS-14361: ----------------------------- Let's go back to original design doc [^Multiple-Standby-NameNodes_V1.pdf] SNN will get a HttpServletResponse.SC_CONFLICT status when ANN already download image file from other SNN. So it is not a serious issue if all SNN send checkpoint request to ANN. Further more, SNN will send download image request as shown in comments above line 429 in 3 cases: * {color:#808080}rollback request{color} * {color:#808080}are the checkpointer{color} * {color:#808080} are outside the quiet period{color} {color:#808080}{color:#333333}But from the patch only in later two case will SNN send download request. I think it causes issue{color} HDFS-12248.{color} {code:java} if (needCheckpoint) { // on all nodes, we build the checkpoint. However, we only ship the checkpoint if have a // rollback request, are the checkpointer, are outside the quiet period. final long secsSinceLastUpload = (now - lastUploadTime) / 1000; boolean sendRequest = isPrimaryCheckPointer || secsSinceLastUpload >= checkpointConf.getQuietPeriod(); doCheckpoint(sendRequest); ... }{code} I agree to to move isPrimaryCheckPointer outside of 'if' block to avoid a inconsistent state that there are more than 1 SNN with isPrimaryCheckPointer = true, though it will not break anything. As to {color:#808080}HDFS-12248,{color:#333333} I think{color}{color} we may change sendRequest as following: {code:java} boolean sendRequest = needRollbackCheckpoint || isPrimaryCheckPointer || secsSinceLastUpload >= checkpointConf.getQuietPeriod(); {code} Thus all SNN will send request everytime rollbackCheckpoint is triggered. Or we should fix the comments. > SNN will always upload fsimage > ------------------------------ > > Key: HDFS-14361 > URL: https://issues.apache.org/jira/browse/HDFS-14361 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode > Affects Versions: 3.2.0 > Reporter: hunshenshi > Priority: Major > Fix For: 3.2.0 > > > Related to -HDFS-12248.- > {code:java} > boolean sendRequest = isPrimaryCheckPointer > || secsSinceLastUpload >= checkpointConf.getQuietPeriod(); > doCheckpoint(sendRequest); > {code} > If sendRequest is true, SNN will upload fsimage. But isPrimaryCheckPointer > always is true, > {code:java} > if (ie == null && ioe == null) { > //Update only when response from remote about success or > lastUploadTime = monotonicNow(); > // we are primary if we successfully updated the ANN > this.isPrimaryCheckPointer = success; > } > {code} > isPrimaryCheckPointer should be outside the if condition. > If the ANN update was not successful, then isPrimaryCheckPointer should be > set to false. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org