[
https://issues.apache.org/jira/browse/HDFS-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16788289#comment-16788289
]
Erik Krogen commented on HDFS-14317:
------------------------------------
I synced with [~ekanth] offline and he informed me that the in-progress edit
log tailing is not in branch-2, so we don't need this there. My mistake.
[~ekanth], I did have one more concern. I did not think of this earlier, but
will this change accidentally cause the edit log to be rolled more frequently
than desired? Previously, if one Standby rolled the edit logs, then the others
would reset their timer (by updating {{lastLoadTimeMs}}) and so they would skip
rolling the edit logs. Same thing if the Active rolls the edit logs; all
Standbys will reset their timers.
Now, with this change, the timer is only updated when the Standby actually
rolls the edits on its own. Thus the timer for Standby A will not be updated if
Standby B, or the Active, initiates a log roll. Rolling the edit log too
frequently won't hurt correctness, but will create more edit log segments than
desired.
Let me know if you agree with the above -- if so let's file a new JIRA to do
something to fix this.
> Standby does not trigger edit log rolling when in-progress edit log tailing
> is enabled
> --------------------------------------------------------------------------------------
>
> Key: HDFS-14317
> URL: https://issues.apache.org/jira/browse/HDFS-14317
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 2.9.0, 3.0.0
> Reporter: Ekanth Sethuramalingam
> Assignee: Ekanth Sethuramalingam
> Priority: Critical
> Fix For: 3.0.4, 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-14317.001.patch, HDFS-14317.002.patch,
> HDFS-14317.003.patch, HDFS-14317.004.patch
>
>
> The standby uses the following method to check if it is time to trigger edit
> log rolling on active.
> {code}
> /**
> * @return true if the configured log roll period has elapsed.
> */
> private boolean tooLongSinceLastLoad() {
> return logRollPeriodMs >= 0 &&
> (monotonicNow() - lastLoadTimeMs) > logRollPeriodMs ;
> }
> {code}
> In doTailEdits(), lastLoadTimeMs is updated when standby is able to
> successfully tail any edits
> {code}
> if (editsLoaded > 0) {
> lastLoadTimeMs = monotonicNow();
> }
> {code}
> The default configuration for {{dfs.ha.log-roll.period}} is 120 seconds and
> {{dfs.ha.tail-edits.period}} is 60 seconds. With in-progress edit log tailing
> enabled, tooLongSinceLastLoad() will almost never return true resulting in
> edit logs not rolled for a long time until this configuration
> {{dfs.namenode.edit.log.autoroll.multiplier.threshold}} takes effect.
> [In our deployment, this resulted in in-progress edit logs getting deleted.
> The sequence of events is that standby was able to checkpoint twice while the
> in-progress edit log was growing on active. When the
> NNStorageRetentionManager decided to cleanup old checkpoints and edit logs,
> it cleaned up the in-progress edit log from active and QJM (as the txnid on
> in-progress edit log was older than the 2 most recent checkpoints) resulting
> in irrecoverably losing a few minutes worth of metadata].
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]