Viraj Jasani created PHOENIX-7390:
-------------------------------------
Summary: Phoenix metadata updates should fail-fast for noisy
neighbor
Key: PHOENIX-7390
URL: https://issues.apache.org/jira/browse/PHOENIX-7390
Project: Phoenix
Issue Type: Improvement
Affects Versions: 5.1.3, 5.2.0
Reporter: Viraj Jasani
Phoenix is high scale, low latency, high throughput multi-tenant database. The
multi-tenancy can come with its own set of challenges, one of which is noisy
neighbour problem. Single client can initiate very high num of tenant view
updates (e.g. drop views, create views, create index etc) while all other
clients are making RPC calls to SYSTEM.CATALOG for retrieving the updated
PTable objects. With more metadata update calls, it is possible for more RPC
calls to get stuck while waiting for HBase RowLock to be acquired. We have also
seen high memory pressure with increasing num of metadata update APIs.
HBase RowLock by default has 30s of timeout for acquiring lock, which is
configurable by {_}hbase.rowlock.wait.duration{_}. While this is applicable at
the cluster level, Phoenix metadata RPC calls are expected to have much lower
timeout value for the RowLock acquisition because metadata updates and reads
are expected to be extremely low latency operations. If this is not the case,
we are essentially blocking some client from getting either enough RPC handlers
to execute getTable RPC call or causing significant delays with ongoing
getTable calls.
While HBASE-28797 has a proposal to introduce new Region API for acquiring
RowLock, Phoenix already has its own RowLock implementation and its already
being used by getTable RPC calls while protecting metadata server side cache
updates (PHOENIX-7363).
The proposal of this Jira is to eliminate using HBase RowLock for all Phoenix
metadata operations and use Phoenix RowLock with default timeout of 3 sec.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)