[ https://issues.apache.org/jira/browse/PHOENIX-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15096913#comment-15096913 ]
Hudson commented on PHOENIX-2143: --------------------------------- FAILURE: Integrated in Phoenix-master #1074 (See [https://builds.apache.org/job/Phoenix-master/1074/]) PHOENIX-2143 Use guidepost bytes instead of region name in stats primary (jtaylor: rev 90cf5730058246914e7fc616c43f2837fd499824) * phoenix-core/src/main/java/org/apache/phoenix/schema/stats/StatisticsScanner.java * phoenix-core/src/main/java/org/apache/phoenix/schema/stats/StatisticsUtil.java * phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataProtocol.java * phoenix-core/src/main/java/org/apache/phoenix/schema/MetaDataClient.java * phoenix-core/src/it/java/org/apache/phoenix/end2end/StatsCollectorWithSplitsAndMultiCFIT.java * phoenix-core/src/main/java/org/apache/phoenix/jdbc/PhoenixDatabaseMetaData.java * phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionObserver.java * phoenix-core/src/main/java/org/apache/phoenix/util/MetaDataUtil.java * phoenix-core/src/it/java/org/apache/phoenix/end2end/StatsCollectorIT.java * phoenix-core/src/main/java/org/apache/phoenix/schema/stats/StatisticsCollector.java * phoenix-core/src/main/java/org/apache/phoenix/util/ScanUtil.java * phoenix-core/src/main/java/org/apache/phoenix/schema/stats/StatisticsWriter.java * phoenix-core/src/main/java/org/apache/phoenix/query/QueryConstants.java * phoenix-core/src/main/java/org/apache/phoenix/schema/stats/GuidePostsInfo.java * phoenix-core/src/main/java/org/apache/phoenix/query/ConnectionQueryServicesImpl.java > Use guidepost bytes instead of region name in stats primary key > --------------------------------------------------------------- > > Key: PHOENIX-2143 > URL: https://issues.apache.org/jira/browse/PHOENIX-2143 > Project: Phoenix > Issue Type: Sub-task > Reporter: James Taylor > Assignee: Ankit Singhal > Fix For: 4.7.0 > > Attachments: PHOENIX-2143.patch, PHOENIX-2143_v2.patch, > PHOENIX-2143_v3.patch, PHOENIX-2143_v4.patch, PHOENIX-2143_v4_rebased.patch, > PHOENIX-2143_wip.patch, PHOENIX-2143_wip_2.patch > > > Our current SYSTEM.STATS table uses the region name as the last column in the > primary key constraint. Instead, we should use the MIN_KEY column (which > corresponds to the region start key). The advantage would be that the stats > would then be ordered by region start key allowing us to approximate the > number of guideposts which would be traversed given the start/stop row of a > scan: > {code} > SELECT SUM(guide_posts_count) FROM SYSTEM.STATS WHERE min_key > :1 AND > min_key < :2 > {code} > where :1 is the start row and :2 is the stop row of the scan. With an UNNEST > operator for ARRAYs, we could get a better approximation. > As part of the upgrade to the new Phoenix version containing this fix, stats > could simply be dropped and they'd be recalculated with the new schema. > An alternative, even more granular approach would be to *not* use arrays to > store the guide posts, but instead store them as individual rows with a > schema like this. > |PHYSICAL_NAME|VARCHAR| > |COLUMN_FAMILY|VARCHAR| > |GUIDE_POST_KEY|VARBINARY| > In this alternative, the maintenance during compaction is higher, though, as > you'd need to run a separate query to do the deletion of the old guideposts, > followed by a commit of the new guideposts. The other disadvantage (besides > requiring multiple queries) is that this couldn't be done transactionally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)