Viraj Jasani created PHOENIX-7253:
-------------------------------------
Summary: Perf improvement for non-full scan queries on large table
Key: PHOENIX-7253
URL: https://issues.apache.org/jira/browse/PHOENIX-7253
Project: Phoenix
Issue Type: Improvement
Affects Versions: 5.1.3, 5.2.0
Reporter: Viraj Jasani
Any considerably large table with more than 100k regions can give problematic
performance if we access all region locations from meta for the given table
before generating parallel or sequential scans for the given query. The perf
impact can really hurt range scan queries.
Consider a table with hundreds of thousands of tenant views. Unless the query
is strict point lookup, any query on any tenant view would end up retrieving
region locations of all regions of the base table. In case if IOException is
thrown by HBase client during any region location lookup in meta, we only
perform single retry.
Proposal:
# All non point lookup queries should only retrieve region locations that
cover the scan boundary. Avoid fetching all region locations of the base table.
# Make retries configurable with higher default value.
Sample stacktrace from the multiple failures observed:
{code:java}
java.sql.SQLException: ERROR 1102 (XCL02): Cannot get all table regions.Stack
trace: java.sql.SQLException: ERROR 1102 (XCL02): Cannot get all table regions.
at
org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:620)
at
org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:229)
at
org.apache.phoenix.query.ConnectionQueryServicesImpl.getAllTableRegions(ConnectionQueryServicesImpl.java:781)
at
org.apache.phoenix.query.DelegateConnectionQueryServices.getAllTableRegions(DelegateConnectionQueryServices.java:87)
at
org.apache.phoenix.query.DelegateConnectionQueryServices.getAllTableRegions(DelegateConnectionQueryServices.java:87)
at
org.apache.phoenix.iterate.DefaultParallelScanGrouper.getRegionBoundaries(DefaultParallelScanGrouper.java:74)
at
org.apache.phoenix.iterate.BaseResultIterators.getRegionBoundaries(BaseResultIterators.java:587)
at
org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:936)
at
org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:669)
at
org.apache.phoenix.iterate.BaseResultIterators.<init>(BaseResultIterators.java:555)
at
org.apache.phoenix.iterate.SerialIterators.<init>(SerialIterators.java:69)
at org.apache.phoenix.execute.ScanPlan.newIterator(ScanPlan.java:278)
at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:374)
at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:222)
at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:217)
at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:212)
at
org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:370)
at
org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:328)
at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
at
org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:328)
at
org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:320)
at
org.apache.phoenix.jdbc.PhoenixPreparedStatement.executeQuery(PhoenixPreparedStatement.java:188)
...
...
Caused by: java.io.InterruptedIOException: Origin: InterruptedException
at
org.apache.hadoop.hbase.util.ExceptionUtil.asInterrupt(ExceptionUtil.java:72)
at
org.apache.hadoop.hbase.client.ConnectionImplementation.takeUserRegionLock(ConnectionImplementation.java:1129)
at
org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegionInMeta(ConnectionImplementation.java:994)
at
org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:895)
at
org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:881)
at
org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:851)
at
org.apache.hadoop.hbase.client.ConnectionImplementation.getRegionLocation(ConnectionImplementation.java:730)
at
org.apache.phoenix.query.ConnectionQueryServicesImpl.getAllTableRegions(ConnectionQueryServicesImpl.java:766)
... 254 more
Caused by: java.lang.InterruptedException
at
java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:982)
at
java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1288)
at
java.base/java.util.concurrent.locks.ReentrantLock.tryLock(ReentrantLock.java:424)
at
org.apache.hadoop.hbase.client.ConnectionImplementation.takeUserRegionLock(ConnectionImplementation.java:1117)
... 264 more {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)