[
https://issues.apache.org/jira/browse/PHOENIX-180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141681#comment-14141681
]
Hudson commented on PHOENIX-180:
--------------------------------
FAILURE: Integrated in Phoenix | 4.0 | Hadoop2 #142 (See
[https://builds.apache.org/job/Phoenix-4.0-hadoop2/142/])
PHOENIX-180 Use stats to guide query parallelization (Ramkrishna S Vasudevan)
(jtaylor: rev 5cdc938e8f6ffc7db629f39951270e89dd4873b1)
* phoenix-core/src/main/java/org/apache/phoenix/schema/MetaDataClient.java
* phoenix-core/src/it/java/org/apache/phoenix/end2end/StatsCollectorIT.java
* phoenix-core/src/test/java/org/apache/phoenix/query/BaseTest.java
* phoenix-core/src/it/java/org/apache/phoenix/end2end/GuidePostsLifeCycleIT.java
* phoenix-core/src/it/java/org/apache/phoenix/end2end/QueryIT.java
* phoenix-core/src/main/java/org/apache/phoenix/schema/PTable.java
* phoenix-core/src/main/java/org/apache/phoenix/schema/stat/StatisticsUtils.java
* phoenix-core/src/main/java/org/apache/phoenix/parse/ParseNodeFactory.java
* phoenix-core/src/it/java/org/apache/phoenix/end2end/StatsManagerIT.java
*
phoenix-core/src/it/java/org/apache/phoenix/end2end/TenantSpecificTablesDDLIT.java
*
phoenix-core/src/main/java/org/apache/phoenix/query/ConnectionQueryServicesImpl.java
* phoenix-core/src/main/java/org/apache/phoenix/schema/stat/PTableStats.java
*
phoenix-core/src/main/java/org/apache/phoenix/schema/stat/StatisticsCollector.java
*
phoenix-core/src/it/java/org/apache/phoenix/end2end/QueryDatabaseMetaDataIT.java
*
phoenix-core/src/main/java/org/apache/phoenix/coprocessor/generated/StatCollectorProtos.java
* phoenix-core/src/main/java/org/apache/phoenix/query/QueryConstants.java
*
phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java
* pom.xml
*
phoenix-core/src/it/java/org/apache/phoenix/end2end/BaseClientManagedTimeIT.java
* phoenix-core/src/main/java/org/apache/phoenix/schema/PTableImpl.java
* phoenix-core/src/it/java/org/apache/phoenix/end2end/BaseQueryIT.java
* phoenix-core/src/main/java/org/apache/phoenix/schema/PColumnFamilyImpl.java
*
phoenix-core/src/main/java/org/apache/phoenix/jdbc/PhoenixDatabaseMetaData.java
* phoenix-core/src/test/java/org/apache/phoenix/query/QueryServicesTestImpl.java
*
phoenix-core/src/it/java/org/apache/phoenix/end2end/salted/SaltedTableUpsertSelectIT.java
*
phoenix-core/src/main/java/org/apache/phoenix/iterate/DefaultParallelIteratorRegionSplitter.java
*
phoenix-core/src/main/java/org/apache/phoenix/iterate/LocalIndexParallelIteratorRegionSplitter.java
*
phoenix-core/src/main/java/org/apache/phoenix/query/DelegateConnectionQueryServices.java
* phoenix-core/src/main/java/org/apache/phoenix/schema/stat/StatisticsTable.java
* phoenix-core/src/it/java/org/apache/phoenix/end2end/ReverseScanIT.java
*
phoenix-core/src/it/java/org/apache/phoenix/end2end/BaseParallelIteratorsRegionSplitterIT.java
* phoenix-core/src/main/java/org/apache/phoenix/query/QueryServices.java
*
phoenix-core/src/it/java/org/apache/phoenix/end2end/DefaultParallelIteratorsRegionSplitterIT.java
* phoenix-core/src/main/java/org/apache/phoenix/schema/PColumnFamily.java
* phoenix-core/src/it/java/org/apache/phoenix/end2end/MultiCfQueryExecIT.java
* phoenix-protocol/src/main/StatisticsCollect.proto
* phoenix-core/src/main/antlr3/PhoenixSQL.g
* phoenix-core/src/main/java/org/apache/phoenix/jdbc/PhoenixStatement.java
*
phoenix-core/src/main/java/org/apache/phoenix/coprocessor/generated/MetaDataProtos.java
*
phoenix-core/src/it/java/org/apache/phoenix/end2end/TenantSpecificViewIndexIT.java
* phoenix-core/src/main/java/org/apache/phoenix/query/QueryServicesOptions.java
*
phoenix-core/src/main/java/org/apache/phoenix/schema/stat/StatisticsScanner.java
* phoenix-core/src/it/java/org/apache/phoenix/end2end/index/MutableIndexIT.java
* phoenix-core/src/main/java/org/apache/phoenix/cache/GlobalCache.java
* phoenix-protocol/src/main/MetaDataService.proto
*
phoenix-core/src/main/java/org/apache/phoenix/parse/UpdateStatisticsStatement.java
* phoenix-protocol/src/main/PTable.proto
* phoenix-core/src/it/java/org/apache/phoenix/end2end/HashJoinIT.java
* phoenix-core/src/it/java/org/apache/phoenix/end2end/index/SaltedIndexIT.java
*
phoenix-core/src/main/java/org/apache/phoenix/schema/stat/StatisticsTracker.java
*
phoenix-core/src/main/java/org/apache/phoenix/query/ConnectionlessQueryServicesImpl.java
*
phoenix-core/src/main/java/org/apache/phoenix/query/ConnectionQueryServices.java
* phoenix-core/src/main/java/org/apache/phoenix/schema/stat/PTableStatsImpl.java
*
phoenix-core/src/it/java/org/apache/phoenix/end2end/SkipRangeParallelIteratorRegionSplitterIT.java
* phoenix-core/src/it/java/org/apache/phoenix/end2end/ArrayIT.java
* phoenix-core/src/it/java/org/apache/phoenix/end2end/BaseViewIT.java
* phoenix-core/src/it/java/org/apache/phoenix/end2end/KeyOnlyIT.java
*
phoenix-core/src/it/java/org/apache/phoenix/end2end/TenantSpecificTablesDMLIT.java
*
phoenix-core/src/main/java/org/apache/phoenix/iterate/ParallelIteratorRegionSplitter.java
> Use stats to guide query parallelization
> ----------------------------------------
>
> Key: PHOENIX-180
> URL: https://issues.apache.org/jira/browse/PHOENIX-180
> Project: Phoenix
> Issue Type: Sub-task
> Reporter: James Taylor
> Assignee: ramkrishna.s.vasudevan
> Labels: enhancement
> Attachments: Phoenix-180_V1.patch, Phoenix-180_V2.patch,
> Phoenix-180_WIP.patch, Phoenix-180_v3.patch, Phoenix-180_v5.patch
>
>
> We're currently not using stats, beyond a table-wide min key/max key cached
> per client connection, to guide parallelization. If a query targets just a
> few regions, we don't know how to evenly divide the work among threads,
> because we don't know the data distribution. This other [issue]
> (https://github.com/forcedotcom/phoenix/issues/64) is targeting gather and
> maintaining the stats, while this issue is focused on using the stats.
> The main changes are:
> 1. Create a PTableStats interface that encapsulates the stats information
> (and implements the Writable interface so that it can be serialized back from
> the server).
> 2. Add a stats member variable off of PTable to hold this.
> 3. From MetaDataEndPointImpl, lookup the stats row for the table in the stats
> table. If the stats have changed, return a new PTable with the updated stats
> information. We may want to cache the stats row and have the stats gatherer
> invalidate the cache row when updated so we don't have to always do a scan
> for it. Additionally, it would be idea if we could use the same split policy
> on the stats table that we use on the system table to guarantee co-location
> of data (for the sake of caching).
> - modify the client-side parallelization (ParallelIterators.getSplits()) to
> use this information to guide how to chunk up the scans at query time.
> This should help boost query performance, especially in cases where the data
> is highly skewed. It's likely the cause for the slowness reported in this
> issue: https://github.com/forcedotcom/phoenix/issues/47.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)