[
https://issues.apache.org/jira/browse/PHOENIX-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15200775#comment-15200775
]
Samarth Jain edited comment on PHOENIX-2724 at 3/18/16 1:32 AM:
----------------------------------------------------------------
I created a table with 300 million rows and 330K+ guideposts. I did some
micro-benchmarking to see where we are spending time and this is what I have:
select * from testExplainPlanTime limit 10;
Time spent in computing 330045 BaseResultIterators.getParallelScans() 122 ms
Included in the above time is the time spent in ScanRanges.intersectScan() 82 ms
In SerialIterators.java, time spent by a single thread in creating 330045
iterators : 1589
Total time spent in above tasks = 1589 + 122 = 1711 ms
Overall query time = 1809 ms
So it turns out the single biggest culprit is this piece of code in
SerialIterators.java:
{code}
@Override
public PeekingResultIterator call() throws Exception {
long startTime = System.currentTimeMillis();
List<PeekingResultIterator> concatIterators =
Lists.newArrayListWithExpectedSize(scans.size());
for (final Scan scan : scans) {
TableResultIterator scanner = new
TableResultIterator(mutationState, tableRef, scan,
context.getReadMetricsQueue().allotMetric(SCAN_BYTES, tableName),
renewLeaseThreshold);
conn.addIterator(scanner);
concatIterators.add(iteratorFactory.newIterator(context, scanner, scan,
tableName));
}
PeekingResultIterator concatIterator =
ConcatResultIterator.newIterator(concatIterators);
allIterators.add(concatIterator);
System.out.println("Serial iterators - time taken to create
" + scans.size() + " iterators : " + (System.currentTimeMillis() - startTime));
return concatIterator;
}
{code}
Looping over 330K+ scans and creating iterators out of them takes up much of
the query time.
was (Author: samarthjain):
I created a table with 300 million rows and 330K+ guideposts. I did some
micro-benchmarking to see where we are spending time and this is what I have:
select * from testExplainPlanTime limit 10;
Time spent in computing 698818 BaseResultIterators.getParallelScans() 1858 ms
In SerialIterators.java, time spent by the thread in creating 698818 iterators
: 3644
Total time taken:
Time spent in computing 330045 BaseResultIterators.getParallelScans() 122 ms
Included in the above time is the time spent in ScanRanges.intersectScan() 82 ms
In SerialIterators.java, time spent by a single thread in creating 330045
iterators : 1589
Total time spent in above tasks = 1589 + 122 = 1711 ms
Overall query time = 1809 ms
So it turns out the single biggest culprit is this piece of code in
SerialIterators.java:
{code}
@Override
public PeekingResultIterator call() throws Exception {
long startTime = System.currentTimeMillis();
List<PeekingResultIterator> concatIterators =
Lists.newArrayListWithExpectedSize(scans.size());
for (final Scan scan : scans) {
TableResultIterator scanner = new
TableResultIterator(mutationState, tableRef, scan,
context.getReadMetricsQueue().allotMetric(SCAN_BYTES, tableName),
renewLeaseThreshold);
conn.addIterator(scanner);
concatIterators.add(iteratorFactory.newIterator(context, scanner, scan,
tableName));
}
PeekingResultIterator concatIterator =
ConcatResultIterator.newIterator(concatIterators);
allIterators.add(concatIterator);
System.out.println("Serial iterators - time taken to create
" + scans.size() + " iterators : " + (System.currentTimeMillis() - startTime));
return concatIterator;
}
{code}
Looping over 330K+ scans and creating iterators out of them takes up much of
the query time.
> Query with large number of guideposts is slower compared to no stats
> --------------------------------------------------------------------
>
> Key: PHOENIX-2724
> URL: https://issues.apache.org/jira/browse/PHOENIX-2724
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 4.7.0
> Environment: Phoenix 4.7.0-RC4, HBase-0.98.17 on a 8 node cluster
> Reporter: Mujtaba Chohan
> Assignee: Samarth Jain
> Fix For: 4.8.0
>
>
> With 1MB guidepost width for ~900GB/500M rows table. Queries with short scan
> range gets significantly slower.
> Without stats:
> {code}
> select * from T limit 10; // query execution time <100 msec
> {code}
> With stats:
> {code}
> select * from T limit 10; // query execution time >20 seconds
> Explain plan: CLIENT 876085-CHUNK 476569382 ROWS 876060986727 BYTES SERIAL
> 1-WAY FULL SCAN OVER T SERVER 10 ROW LIMIT CLIENT 10 ROW LIMIT
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)