[
https://issues.apache.org/jira/browse/HBASE-22057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16815292#comment-16815292
]
Hudson commented on HBASE-22057:
--------------------------------
Results for branch branch-1
[build #766 on
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/766/]:
(x) *{color:red}-1 overall{color}*
----
details (if available):
(x) {color:red}-1 general checks{color}
-- For more information [see general
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/766//General_Nightly_Build_Report/]
(x) {color:red}-1 jdk7 checks{color}
-- For more information [see jdk7
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/766//JDK7_Nightly_Build_Report/]
(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2)
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/766//JDK8_Nightly_Build_Report_(Hadoop2)/]
(/) {color:green}+1 source release artifact{color}
-- See build output for details.
> Impose upper-bound on size of ZK ops sent in a single multi()
> -------------------------------------------------------------
>
> Key: HBASE-22057
> URL: https://issues.apache.org/jira/browse/HBASE-22057
> Project: HBase
> Issue Type: Bug
> Reporter: Josh Elser
> Assignee: Josh Elser
> Priority: Major
> Fix For: 3.0.0, 1.6.0, 2.2.0
>
> Attachments: HBASE-22057-branch-1.patch, HBASE-22057.001.patch,
> HBASE-22057.002.patch, HBASE-22057.003.patch, HBASE-22057.004.patch
>
>
> In {{ZKUtil#multiOrSequential}}, we accept a list of {{ZKUtilOp}}'s to pass
> down to the {{ZooKeeper#multi(Iterable<Op>)}} method.
> One problem with this approach is that we may generate a large list of ZNodes
> to mutate in one batch which exceeds the allowable client package length,
> specified by {{jute.maxbuffer}}.
> This problem can manifest when we have a large number of WALs to replicate,
> queued in ZooKeeper, from a disabled peer. When that peer is dropped, the RS
> would submit deletes of those queued WALs. The RS will see ConnectionLoss for
> the resulting {{multi()}} calls it tries to make, because we are sending too
> large of a client message (because we're trying to delete too many WALs at
> once). The result (at least in branch-1 ish versions) is that the RS aborts
> after exceeding the ZK retries (as this operation will never succeed).
> A simple fix would be to impose a maximum number of Ops to run in a single
> batch inside ZKUtil, and split apart the caller-submitted batch into smaller
> chunks. Before we make such a change, I do need to make sure that we don't
> have any expectations on atomicity of the operations. I'm not sure what ZK
> provides here -- for the above example, splitting up batches of deletes is
> not an issue, but there could be issues with batches of creates where we only
> apply some.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)