[
https://issues.apache.org/jira/browse/HDFS-14806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Erik Krogen updated HDFS-14806:
-------------------------------
Comment: was deleted
(was: | (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m
21s{color} | {color:red} Docker failed to build yetus/hadoop:bdbca0e53b4.
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-14806 |
| JIRA Patch URL |
https://issues.apache.org/jira/secure/attachment/12979466/HDFS-14806.002.patch |
| Console output |
https://builds.apache.org/job/PreCommit-HDFS-Build/27783/console |
| Powered by | Apache Yetus 0.8.0 http://yetus.apache.org |
This message was automatically generated.
)
> Bootstrap standby may fail if used in-progress tailing
> ------------------------------------------------------
>
> Key: HDFS-14806
> URL: https://issues.apache.org/jira/browse/HDFS-14806
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 3.3.0
> Reporter: Chen Liang
> Assignee: Chen Liang
> Priority: Major
> Attachments: HDFS-14806.001.patch, HDFS-14806.002.patch
>
>
> One issue we went across was that if in-progress tailing is enabled,
> bootstrap standby could fail.
> When in-progress tailing is enabled, Bootstrap uses the RPC mechanism to get
> edits. There is a config {{dfs.ha.tail-edits.qjm.rpc.max-txns}} that sets an
> upper bound on how many txnid can be included in one RPC call. The default is
> 5000. Meaning bootstraping NN (say NN1) can only pull at most 5000 edits from
> JN. However, as part of bootstrap, NN1 queries another NN (say NN2) for NN2's
> current transactionID, NN2 may return a state that is > 5000 txnid from NN1's
> current image. But NN1 can only see 5000 more txnid from JNs. At this point
> NN1 goes panic, because txnid retuned by JNs is behind NN2's returned state,
> bootstrap then fail.
> Essentially, bootstrap standby can fail if both of two following conditions
> are met:
> # in-progress tailing is enabled AND
> # the boostraping NN is too far (>5000 txid) behind
> Increasing the value of {{dfs.ha.tail-edits.qjm.rpc.max-txns}} to some super
> large value allowed bootstrap to continue. But this is hardly the ideal
> solution.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]