[
https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432029#comment-13432029
]
Himanshu Vashishtha commented on HBASE-6165:
--------------------------------------------
[~eclark]: I used custom, because the current naming scheme is not appropriate
in my opinion (I started with medium/semi QOS, but then changed it to Custom).
Using priority is kind of a misnomer as there is no priority as such, its just
different set of handlers that is serving the requests.
Though we call them priorityHandlers, etc, they are just like regular handlers
but for meta operations. I think we should change their name to metaOpsHandlers
(or metaHandlers). Yea, I just used a threshold b/w 0 and 10.
bq. Since this starts 0 "custom" priority handlers by default it will add
another undocumented step when enabling replication. We should either make the
number of handlers start by default > 0, or have the number depend on if
replication is enabled.
I am ok with >0 default; don't think it should be tied to replication as they
can be used for other methods too (such as Security, etc)
@Lars:
bq. The naming is weird. These are not "Custom"QOS, but "Medium"QOS methods,
right?
Hope you find it rationale now.
bq. By default now (if hbase.regionserver.custom.priority.handler.count is not
set), replicateWALEntry would use non-priority handlers... Which is not right,
I think. It should revert back to the current behavior in that case (which is
to do use the priorityQOS.
default > 0 sounds good?
bq. What I still do not understand... Does this problem always happen? Does it
happen because replicateWALEntry takes too long to finish? Does this only
happen when the slave is already degraded for other reasons? Should we also
work on replicateWALEntry failing faster in case of problems (shorter/fewer
retries, etc)?
It can occur when the slave cluster is slow. And whenever it happens, it will
make the entire cluster unresponsive. I have a patch which adds the fail fast
behavior in sink and has been testing it too. It looks good so far. I tried
creating a new JIRA but IOE while creating it (see INFRA-5131). Will attach the
patch once its created.
> Replication can overrun .META scans on cluster re-start
> -------------------------------------------------------
>
> Key: HBASE-6165
> URL: https://issues.apache.org/jira/browse/HBASE-6165
> Project: HBase
> Issue Type: Bug
> Reporter: Elliott Clark
> Attachments: HBase-6165-v1.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the
> replication from another cluster tied up every xceiver meaning nothing could
> be onlined.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira