[
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16260121#comment-16260121
]
Appy edited comment on HBASE-19290 at 11/21/17 1:09 AM:
--------------------------------------------------------
bq. int sleepTime = RandomUtils.nextInt(0, 100) + 500;
Why randomize? Can be constant?
----
bq if (taskGrabbed == 0 && !shouldStop) {
So there are 2 available splitters, and one grabbed task, we don't stop here
and keep hammering zk?
----
{quote}
int idx = (i + offset) % paths.size();
446 // don't call ZKSplitLog.getNodeName() because that will lead to
447 // double encoding of the path name
448 taskGrabbed +=
grabTask(ZNodePaths.joinZNode(watcher.znodePaths.splitLogZNode,
paths.get(idx))) ? 1 : 0;
{quote}
Can do it in "if" condition itself?
----
bq. taskReadySeq.wait may not execute because it has condition.
That while condition is just to handle spurious wakeups. See Object#wait. You
can definitely remove the second sleep (unless there's a concrete reason not
to).
was (Author: appy):
bq. int sleepTime = RandomUtils.nextInt(0, 100) + 500;
Why randomize? Can be constant?
bq if (taskGrabbed == 0 && !shouldStop) {
So there are 2 available splitters, and one grabbed task, we don't stop here
and keep hammering zk?
Probably change taskGrabbed
{quote}
int idx = (i + offset) % paths.size();
446 // don't call ZKSplitLog.getNodeName() because that will lead to
447 // double encoding of the path name
448 taskGrabbed +=
grabTask(ZNodePaths.joinZNode(watcher.znodePaths.splitLogZNode,
paths.get(idx))) ? 1 : 0;
{quote}
Can do it in "if" condition itself?
bq. taskReadySeq.wait may not execute because it has condition.
That while condition is just to handle spurious wakeups. See Object#wait. You
can definitely remove the second sleep (unless there's a concrete reason not
to).
> Reduce zk request when doing split log
> --------------------------------------
>
> Key: HBASE-19290
> URL: https://issues.apache.org/jira/browse/HBASE-19290
> Project: HBase
> Issue Type: Improvement
> Reporter: binlijin
> Assignee: binlijin
> Attachments: HBASE-19290.master.001.patch,
> HBASE-19290.master.002.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort
> and doing split log, the split is very very slow, and we find the
> regionserver and master wait on the zookeeper response, so we need to reduce
> zookeeper request and pressure for big cluster.
> (1) Reduce request to rsZNode, every time calculateAvailableSplitters will
> get rsZNode's children from zookeeper, when cluster is huge, this is heavy.
> This patch reduce the request.
> (2) When the regionserver has max split tasks running, it may still trying to
> grab task and issue zookeeper request, we should sleep and wait until we can
> grab tasks again.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)