[
https://issues.apache.org/jira/browse/HBASE-15146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15343790#comment-15343790
]
Elliott Clark commented on HBASE-15146:
---------------------------------------
bq.In general, gradually reducing performance is rather preferable in heavy
load.
We've found the exact opposite many many times. Pushing back on the client is a
well know and understood load shedding mechanism. That allows the server to
take what it can handle and no more.
By contrast every time the server promises to do work that it can't handle
things get worse. GC gets worse, queue call times get worse, and it becomes a
cycle. That continues until a regionserver is in-operable. Removing threads
that can call select leads to multiple seconds where no tcp acks are sent. On
loaded servers we saw all reader threads completely stop any network selects at
all.
bq.Selector.select immediately causes a context switch when an event occurs,
and this patch might make worse performance in such subtle heavy congestion.
Yes it does, and you want to get the reader threads back to the calling select
as fast as possible. That's the most basic tenant of an event loop. What was
happening was that the threads would stop for multiple seconds because the
queues were full. That meant the event loop is stopped.
> Don't block on Reader threads queueing to a scheduler queue
> -----------------------------------------------------------
>
> Key: HBASE-15146
> URL: https://issues.apache.org/jira/browse/HBASE-15146
> Project: HBase
> Issue Type: Bug
> Affects Versions: 1.2.0
> Reporter: Elliott Clark
> Assignee: Elliott Clark
> Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-15146-v7.patch, HBASE-15146-v8.patch,
> HBASE-15146-v8.patch, HBASE-15146.0.patch, HBASE-15146.1.patch,
> HBASE-15146.2.patch, HBASE-15146.3.patch, HBASE-15146.4.patch,
> HBASE-15146.5.patch, HBASE-15146.6.patch
>
>
> Blocking on the epoll thread is awful. The new rpc scheduler can have lots of
> different queues. Those queues have different capacity limits. Currently the
> dispatch method can block trying to add the the blocking queue in any of the
> schedulers.
> This causes readers to block, tcp acks are delayed, and everything slows down.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)