[jira] [Commented] (HBASE-22057) Impose upper-bound on size of ZK ops sent in a single multi()

Hudson (JIRA) Thu, 11 Apr 2019 03:22:22 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-22057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16815292#comment-16815292
 ]


Hudson commented on HBASE-22057:
--------------------------------

Results for branch branch-1
        [build #766 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/766/]: 
(x) *{color:red}-1 overall{color}*
----
details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/766//General_Nightly_Build_Report/]


(x) {color:red}-1 jdk7 checks{color}
-- For more information [see jdk7 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/766//JDK7_Nightly_Build_Report/]


(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/766//JDK8_Nightly_Build_Report_(Hadoop2)/]




(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> Impose upper-bound on size of ZK ops sent in a single multi()
> -------------------------------------------------------------
>
>                 Key: HBASE-22057
>                 URL: https://issues.apache.org/jira/browse/HBASE-22057
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>            Priority: Major
>             Fix For: 3.0.0, 1.6.0, 2.2.0
>
>         Attachments: HBASE-22057-branch-1.patch, HBASE-22057.001.patch, 
> HBASE-22057.002.patch, HBASE-22057.003.patch, HBASE-22057.004.patch
>
>
> In {{ZKUtil#multiOrSequential}}, we accept a list of {{ZKUtilOp}}'s to pass 
> down to the {{ZooKeeper#multi(Iterable<Op>)}} method.
> One problem with this approach is that we may generate a large list of ZNodes 
> to mutate in one batch which exceeds the allowable client package length, 
> specified by {{jute.maxbuffer}}.
> This problem can manifest when we have a large number of WALs to replicate, 
> queued in ZooKeeper, from a disabled peer. When that peer is dropped, the RS 
> would submit deletes of those queued WALs. The RS will see ConnectionLoss for 
> the resulting {{multi()}} calls it tries to make, because we are sending too 
> large of a client message (because we're trying to delete too many WALs at 
> once). The result (at least in branch-1 ish versions) is that the RS aborts 
> after exceeding the ZK retries (as this operation will never succeed).
> A simple fix would be to impose a maximum number of Ops to run in a single 
> batch inside ZKUtil, and split apart the caller-submitted batch into smaller 
> chunks. Before we make such a change, I do need to make sure that we don't 
> have any expectations on atomicity of the operations. I'm not sure what ZK 
> provides here -- for the above example, splitting up batches of deletes is 
> not an issue, but there could be issues with batches of creates where we only 
> apply some.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-22057) Impose upper-bound on size of ZK ops sent in a single multi()

Reply via email to