[jira] [Commented] (HDFS-14806) Bootstrap standby may fail if used in-progress tailing

Erik Krogen (Jira) Wed, 04 Sep 2019 13:33:27 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-14806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922841#comment-16922841
 ]


Erik Krogen commented on HDFS-14806:
------------------------------------

With the streaming mechanism, the bootstrapper sends one RPC (to get the 
manifest), then sets up one HTTP connection for the entire range of edits, but 
doesn't read any data from it. So the total cost (per JN) is 1 RPC + 1 HTTP 
connection setup, and does not scale based on the number of transactions. The 
actual transaction data isn't read by the bootstrapper -- just the metadata.

With the RPC mechanism, the bootstrapper sends {{(txn count)/5000}} RPCs. Even 
worse, each of these RPCs is large, since each one actually contains 5000 
transactions. So you not only have the round trip time of many RPCs, you also 
have actually loaded all of the transaction data.

So basically the original intention was to do an O(1) metadata lookup, but 
using RPCs we end up with an O(N) data transfer to achieve the same thing.

> Bootstrap standby may fail if used in-progress tailing
> ------------------------------------------------------
>
>                 Key: HDFS-14806
>                 URL: https://issues.apache.org/jira/browse/HDFS-14806
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 3.3.0
>            Reporter: Chen Liang
>            Assignee: Chen Liang
>            Priority: Major
>         Attachments: HDFS-14806.001.patch, HDFS-14806.002.patch
>
>
> One issue we went across was that if in-progress tailing is enabled, 
> bootstrap standby could fail.
> When in-progress tailing is enabled, Bootstrap uses the RPC mechanism to get 
> edits. There is a config {{dfs.ha.tail-edits.qjm.rpc.max-txns}} that sets an 
> upper bound on how many txnid can be included in one RPC call. The default is 
> 5000. Meaning bootstraping NN (say NN1) can only pull at most 5000 edits from 
> JN. However, as part of bootstrap, NN1 queries another NN (say NN2) for NN2's 
> current transactionID, NN2 may return a state that is > 5000 txnid from NN1's 
> current image. But NN1 can only see 5000 more txnid from JNs. At this point 
> NN1 goes panic, because txnid retuned by JNs is behind NN2's returned state, 
> bootstrap then fail.
> Essentially, bootstrap standby can fail if both of two following conditions 
> are met:
>  # in-progress tailing is enabled AND
>  # the boostraping NN is too far (>5000 txid)  behind 
> Increasing the value of {{dfs.ha.tail-edits.qjm.rpc.max-txns}} to some super 
> large value allowed bootstrap to continue. But this is hardly the ideal 
> solution.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-14806) Bootstrap standby may fail if used in-progress tailing

Reply via email to