[
https://issues.apache.org/jira/browse/HDFS-14806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968544#comment-16968544
]
Hudson commented on HDFS-14806:
-------------------------------
SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17615 (See
[https://builds.apache.org/job/Hadoop-trunk-Commit/17615/])
HDFS-14806. Bootstrap standby may fail if with in-progress tailing. (cliang:
rev 9d0d580031006ca6db9b4150f17ab678ce68a257)
* (add)
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestBootstrapStandbyWithInProgressTailing.java
* (edit)
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/BootstrapStandby.java
* (edit)
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestBootstrapStandbyWithQJM.java
> Bootstrap standby may fail if used in-progress tailing
> ------------------------------------------------------
>
> Key: HDFS-14806
> URL: https://issues.apache.org/jira/browse/HDFS-14806
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 3.3.0
> Reporter: Chen Liang
> Assignee: Chen Liang
> Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14806.001.patch, HDFS-14806.002.patch,
> HDFS-14806.003.patch, HDFS-14806.004.patch
>
>
> One issue we went across was that if in-progress tailing is enabled,
> bootstrap standby could fail.
> When in-progress tailing is enabled, Bootstrap uses the RPC mechanism to get
> edits. There is a config {{dfs.ha.tail-edits.qjm.rpc.max-txns}} that sets an
> upper bound on how many txnid can be included in one RPC call. The default is
> 5000. Meaning bootstraping NN (say NN1) can only pull at most 5000 edits from
> JN. However, as part of bootstrap, NN1 queries another NN (say NN2) for NN2's
> current transactionID, NN2 may return a state that is > 5000 txnid from NN1's
> current image. But NN1 can only see 5000 more txnid from JNs. At this point
> NN1 goes panic, because txnid retuned by JNs is behind NN2's returned state,
> bootstrap then fail.
> Essentially, bootstrap standby can fail if both of two following conditions
> are met:
> # in-progress tailing is enabled AND
> # the boostraping NN is too far (>5000 txid) behind
> Increasing the value of {{dfs.ha.tail-edits.qjm.rpc.max-txns}} to some super
> large value allowed bootstrap to continue. But this is hardly the ideal
> solution.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]