[
https://issues.apache.org/jira/browse/PHOENIX-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824546#comment-17824546
]
ASF GitHub Bot commented on PHOENIX-7253:
-----------------------------------------
virajjasani commented on PR #1848:
URL: https://github.com/apache/phoenix/pull/1848#issuecomment-1984557213
@dbwong here are some of the observations:
In both HBase 1 and 2, region locations for the given table are cached at
HBase Connection level. Phoenix CQSI connections are by default cached for ~24
hr. In fact, the issue for the large table range scan queries that we have seen
occurs after 24 hr such that multiple range scan queries get hit by performance
issue. Sometimes the queries take ~5-10 min worth of time and sometimes the
thread that performs meta table lookup gets interrupted. However, even after 24
hr, still significant num of queries get affected.
In the incident, what we have observed is that the base table has ~138k
regions. However, the queries are being done using tenant connections. Because
of the fact that we get all table regions regardless of the nature of the
query, we end up spending significant time retrieving region locations and
filling up the connection cache, even when the given query on the tenant view
likely does not require going through more than 5-10 regions.
Hence, this fix would improve the performance of queries being done on the
large tables (and usually tables that share more tenants are larger)
significantly. I am not proposing any change to how the getAllTableRegions API
is written, we can continue to follow the same pattern because HBase still does
not provide API where it can take start and end key of the scan range and
provide all list of region locations in single API, I can file a jira for the
same as well.
The current state of the PR address the performance issues of the queries:
- Range Scan
- Any queries using tenant connection - full scan, range scan, point lookup
- Point lookup or Range scan on Salted table
- Point lookup or Range scan on Salted table with Tenant id and/or View
index id
The only queries that should require all table region locations are the ones
that need full base table scan.
Does this look good to you?
> Perf improvement for non-full scan queries on large table
> ---------------------------------------------------------
>
> Key: PHOENIX-7253
> URL: https://issues.apache.org/jira/browse/PHOENIX-7253
> Project: Phoenix
> Issue Type: Improvement
> Affects Versions: 5.2.0, 5.1.3
> Reporter: Viraj Jasani
> Assignee: Viraj Jasani
> Priority: Critical
> Fix For: 5.2.0, 5.1.4
>
>
> Any considerably large table with more than 100k regions can give problematic
> performance if we access all region locations from meta for the given table
> before generating parallel or sequential scans for the given query. The perf
> impact can really hurt range scan queries.
> Consider a table with hundreds of thousands of tenant views. Unless the query
> is strict point lookup, any query on any tenant view would end up retrieving
> region locations of all regions of the base table. In case if IOException is
> thrown by HBase client during any region location lookup in meta, we only
> perform single retry.
> Proposal:
> # All non point lookup queries should only retrieve region locations that
> cover the scan boundary. Avoid fetching all region locations of the base
> table.
> # Make retries configurable with higher default value.
>
> Sample stacktrace from the multiple failures observed:
> {code:java}
> java.sql.SQLException: ERROR 1102 (XCL02): Cannot get all table regions.Stack
> trace: java.sql.SQLException: ERROR 1102 (XCL02): Cannot get all table
> regions.
> at
> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:620)
> at
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:229)
> at
> org.apache.phoenix.query.ConnectionQueryServicesImpl.getAllTableRegions(ConnectionQueryServicesImpl.java:781)
> at
> org.apache.phoenix.query.DelegateConnectionQueryServices.getAllTableRegions(DelegateConnectionQueryServices.java:87)
> at
> org.apache.phoenix.query.DelegateConnectionQueryServices.getAllTableRegions(DelegateConnectionQueryServices.java:87)
> at
> org.apache.phoenix.iterate.DefaultParallelScanGrouper.getRegionBoundaries(DefaultParallelScanGrouper.java:74)
> at
> org.apache.phoenix.iterate.BaseResultIterators.getRegionBoundaries(BaseResultIterators.java:587)
> at
> org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:936)
> at
> org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:669)
> at
> org.apache.phoenix.iterate.BaseResultIterators.<init>(BaseResultIterators.java:555)
> at
> org.apache.phoenix.iterate.SerialIterators.<init>(SerialIterators.java:69)
> at org.apache.phoenix.execute.ScanPlan.newIterator(ScanPlan.java:278)
> at
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:374)
> at
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:222)
> at
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:217)
> at
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:212)
> at
> org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:370)
> at
> org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:328)
> at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
> at
> org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:328)
> at
> org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:320)
> at
> org.apache.phoenix.jdbc.PhoenixPreparedStatement.executeQuery(PhoenixPreparedStatement.java:188)
> ...
> ...
> Caused by: java.io.InterruptedIOException: Origin: InterruptedException
> at
> org.apache.hadoop.hbase.util.ExceptionUtil.asInterrupt(ExceptionUtil.java:72)
> at
> org.apache.hadoop.hbase.client.ConnectionImplementation.takeUserRegionLock(ConnectionImplementation.java:1129)
> at
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegionInMeta(ConnectionImplementation.java:994)
> at
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:895)
> at
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:881)
> at
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:851)
> at
> org.apache.hadoop.hbase.client.ConnectionImplementation.getRegionLocation(ConnectionImplementation.java:730)
> at
> org.apache.phoenix.query.ConnectionQueryServicesImpl.getAllTableRegions(ConnectionQueryServicesImpl.java:766)
> ... 254 more
> Caused by: java.lang.InterruptedException
> at
> java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:982)
> at
> java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1288)
> at
> java.base/java.util.concurrent.locks.ReentrantLock.tryLock(ReentrantLock.java:424)
> at
> org.apache.hadoop.hbase.client.ConnectionImplementation.takeUserRegionLock(ConnectionImplementation.java:1117)
> ... 264 more {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)