[
https://issues.apache.org/jira/browse/HDFS-9678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15111257#comment-15111257
]
Kihwal Lee commented on HDFS-9678:
----------------------------------
If the regular/recommended upgrade process is successfully followed, users will
not run into this. But if the standby is restarted/rebuilt in the middle of an
upgrade, this can happen.
We could simply make the namenode reset it by calling
{{setNeedRollbackFsImage(false)}} when replaying
{{OP_ROLLING_UPGRADE_FINALIZE}}.
> Standby NN sometimes does not clear needRollbackFsImage
> -------------------------------------------------------
>
> Key: HDFS-9678
> URL: https://issues.apache.org/jira/browse/HDFS-9678
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Kihwal Lee
>
> When the edit log loader sees {{OP_ROLLING_UPGRADE_START}}, it calls
> {{setNeedRollbackFsImage(true)}}. This is cleared on a standby NN only by the
> checkpointer thread when it actually creates a rollback image.
> On {{OP_ROLLING_UPGRADE_FINALIZE}}, the rolling upgrade is finalized, but
> {{needRollbackFsImage}} is not cleared, if a rollback image was never
> created. This result in perpetual checkpointing by the standby NN.
> The standby NN thinks it needs to do chekpointing because it needs to create
> a rollback image, but since it is not in upgrade mode, it creates a regular
> checkpoint, not a rollback image. As a result, the status is not cleared even
> after creating checkpoint.
> The standby will keep checkpointing back-to-back and they will get uploaded
> to the active constantly. We noticed this because of increased sync time on
> the active.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)