[jira] [Commented] (PHOENIX-2143) Use guidepost bytes instead of region name in stats primary key

James Taylor (JIRA) Tue, 12 Jan 2016 22:58:47 -0800

    [ 
https://issues.apache.org/jira/browse/PHOENIX-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15095722#comment-15095722
 ]


James Taylor commented on PHOENIX-2143:
---------------------------------------

I understand the subtracting one from MIN_SYSTEM_TABLE_TIMESTAMP_4_7_0 for the 
deletion of the SYSTEM.STATS rows, but it shouldn't be necessary for the 
addColumn call for the new TRANSACTIONAL column. Make sure in your manual 
testing that you haven't already upgraded to 4.7.0, as it's a one-shot upgrade. 
You need to start from a "pristine" pre-4.7.0 system to test upgrade.

I don't think we need another constant for 
MetaDataProtocol.MIN_SYSTEM_TABLE_TIMESTAMP_4_7_0_1. Let's just subtract one in 
the call to dropStatsTable.

Mind confirming and tweaking your patch then I can get this committed? Thanks 
so much for the patch, [~ankit.singhal].

> Use guidepost bytes instead of region name in stats primary key
> ---------------------------------------------------------------
>
>                 Key: PHOENIX-2143
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2143
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: James Taylor
>            Assignee: Ankit Singhal
>         Attachments: PHOENIX-2143.patch, PHOENIX-2143_v2.patch, 
> PHOENIX-2143_v3.patch, PHOENIX-2143_wip.patch, PHOENIX-2143_wip_2.patch
>
>
> Our current SYSTEM.STATS table uses the region name as the last column in the 
> primary key constraint. Instead, we should use the MIN_KEY column (which 
> corresponds to the region start key). The advantage would be that the stats 
> would then be ordered by region start key allowing us to approximate the 
> number of guideposts which would be traversed given the start/stop row of a 
> scan:
> {code}
> SELECT SUM(guide_posts_count) FROM SYSTEM.STATS WHERE min_key > :1 AND 
> min_key < :2
> {code}
> where :1 is the start row and :2 is the stop row of the scan. With an UNNEST 
> operator for ARRAYs, we could get a better approximation.
> As part of the upgrade to the new Phoenix version containing this fix, stats 
> could simply be dropped and they'd be recalculated with the new schema.
> An alternative, even more granular approach would be to *not* use arrays to 
> store the guide posts, but instead store them as individual rows with a 
> schema like this.
> |PHYSICAL_NAME|VARCHAR|
> |COLUMN_FAMILY|VARCHAR|
> |GUIDE_POST_KEY|VARBINARY|
> In this alternative, the maintenance during compaction is higher, though, as 
> you'd need to run a separate query to do the deletion of the old guideposts, 
> followed by a commit of the new guideposts. The other disadvantage (besides 
> requiring multiple queries) is that this couldn't be done transactionally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-2143) Use guidepost bytes instead of region name in stats primary key

Reply via email to