Sergey Shelukhin created HBASE-22334:
----------------------------------------
Summary: handle blocking RPC threads better (time out calls? )
Key: HBASE-22334
URL: https://issues.apache.org/jira/browse/HBASE-22334
Project: HBase
Issue Type: Bug
Reporter: Sergey Shelukhin
Combined with HBASE-22333, we had the case where user sent lots of create table
requests with pre-split for the same table (because the tasks of some job would
try to create table opportunistically if it doesn't exist, and there were many
such tasks); these requests took up all the RPC threads and caused large call
queue to form; then, the first call got stuck because RS calls to report an
opened region were stuck in queue. All the other calls were stuck here:
{noformat}
submitProcedure(
new CreateTableProcedure(procedureExecutor.getEnvironment(), desc,
newRegions, latch));
latch.await();
{noformat}
The procedures in this case were stuck for hours; even if the other issue was
resolved, assigning 1000s of regions can take a long time and cause lots of
delay before it unblocks the the other procedures and allows them to release
the latch.
In general, waiting on RPC thread is not a good idea. I wonder if it would make
sense to fail client requests taking up the RPC thread based on timeout; or if
they are not making progress (e.g. in this case, the procedure is not getting
updated; might need to be handled on case by case basis).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)