[ https://issues.apache.org/jira/browse/HDFS-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13589781#comment-13589781 ]
Daryn Sharp commented on HDFS-4477: ----------------------------------- The quick-fix is marred by a race condition I was concerned about. Kihwal and have studied the problem and found it's much worse than originally thought. The NN rolls the edits, followed by the 2NN downloading the image and rolled edits. Tokens set to be expired during the duration of the download, but actually renewed during the download, will erroneously be removed from the image because the 2NN doesn't know about this edits. The 2NN will now fail all future checkpoints when it can't apply edits for the non-existent token. The 2NN will now start trying to checkpoint every minute, and always fail. Tokens are renewed at 90% of the expiration. With the default 24h, that's a 2.4h window in which the checkpoint downloads must occur. If the window is blown, you can try to delete the current fsimage on the NN, bounce the 2NN to clear its internal state, and let the 2NN use the prior image and reapply all the older and newer edits. However, if the checkpoint blew the 2.4h window because of anything but a transient load or network congestion, it's going to blow the window again. It'll require NN downtime to force a save of the namespace. Under normal load, some of our grids routinely take 1.5h+ to checkpoint due to the size of our images/edits and throttled download to avoid saturating the NIC. Under heavy load, we are almost certain to lose the race. Or if the 2NN is out of commission for long, we will hit this issue. Incurring at least 15m of cluster downtime is not an option. We need another solution... > Secondary namenode may retain old tokens > ---------------------------------------- > > Key: HDFS-4477 > URL: https://issues.apache.org/jira/browse/HDFS-4477 > Project: Hadoop HDFS > Issue Type: Bug > Components: security > Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0 > Reporter: Kihwal Lee > Assignee: Daryn Sharp > Priority: Critical > Attachments: HDFS-4477.patch, HDFS-4477.patch > > > Upon inspection of a fsimage created by a secondary namenode, we've > discovered it contains very old tokens. These are probably the ones that were > not explicitly canceled. It may be related to the optimization done to avoid > loading fsimage from scratch every time checkpointing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira