[jira] [Commented] (PHOENIX-2143) Use guidepost bytes instead of region name in stats primary key

James Taylor (JIRA) Tue, 12 Jan 2016 19:15:54 -0800

    [ 
https://issues.apache.org/jira/browse/PHOENIX-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15095523#comment-15095523
 ]


James Taylor commented on PHOENIX-2143:
---------------------------------------

Thanks for the patch, [~ankit.singhal]. Overall, it looks very good. Just a 
couple of questions and minor nits (and I agree with [~samarthjain] about the 
formatting changes, but we can give you a free pass this time as the formatting 
changes are mostly around code you've already changed).

One question about the upgrade code in ConnectionQueryServicesImpl:
* Is this change necessary (to use a ts of MIN_SYSTEM_TABLE_TIMESTAMP_4_7_0 - 
1)? If not, please revert that line.
{code}
                                     metaConnection = addColumn(metaConnection, 
PhoenixDatabaseMetaData.SYSTEM_CATALOG,
-                                            
MetaDataProtocol.MIN_SYSTEM_TABLE_TIMESTAMP_4_7_0, columnsToAdd, false);
+                                            
MetaDataProtocol.MIN_SYSTEM_TABLE_TIMESTAMP_4_7_0 - 1, columnsToAdd, false);
{code}
* Also, did you test the upgrade path manually with a pre 4.7.0 client and 
4.7.0 server?

Minor nit here in MetaDataUtil, can you create a new ScanUtil.setTimeRange(long 
minTime, long maxTime) that throws a RuntimeException instead of an IOException 
and then use that instead of scan.setTimeRange() to prevent the IOException 
having to be thrown from this function?:
{code}
--- a/phoenix-core/src/main/java/org/apache/phoenix/util/MetaDataUtil.java
+++ b/phoenix-core/src/main/java/org/apache/phoenix/util/MetaDataUtil.java
@@ -436,24 +436,32 @@ public class MetaDataUtil {
     public static final String IS_LOCAL_INDEX_TABLE_PROP_NAME = 
"IS_LOCAL_INDEX_TABLE";
     public static final byte[] IS_LOCAL_INDEX_TABLE_PROP_BYTES = 
Bytes.toBytes(IS_LOCAL_INDEX_TABLE_PROP_NAME);
 
-    public static Scan newTableRowsScan(byte[] key, long startTimeStamp, long 
stopTimeStamp)
+    public static Scan newTableRowsScan(byte[] key, long startTimeStamp, long 
stopTimeStamp) throws IOException {
+        return newTableRowsScan(key, null, startTimeStamp, stopTimeStamp);
+    }
+
+    public static Scan newTableRowsScan(byte[] startKey, byte[] stopKey, long 
startTimeStamp, long stopTimeStamp)
             throws IOException {
         Scan scan = new Scan();
         scan.setTimeRange(startTimeStamp, stopTimeStamp);
-        scan.setStartRow(key);
-        byte[] stopKey = ByteUtil.concat(key, 
QueryConstants.SEPARATOR_BYTE_ARRAY);
-        ByteUtil.nextKey(stopKey, stopKey.length);
+        scan.setStartRow(startKey);
+        if (stopKey == null) {
+            stopKey = ByteUtil.concat(startKey, 
QueryConstants.SEPARATOR_BYTE_ARRAY);
+            ByteUtil.nextKey(stopKey, stopKey.length);
+        }
         scan.setStopRow(stopKey);
         return scan;
     }
{code}
Also, looks like you won't need to call this line anymore in 
StatisticsUtil.readStatistics as you're setting the time range in the util 
function already:
{code}
+        s.setTimeRange(MetaDataProtocol.MIN_TABLE_TIMESTAMP, clientTimeStamp);
{code}

> Use guidepost bytes instead of region name in stats primary key
> ---------------------------------------------------------------
>
>                 Key: PHOENIX-2143
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2143
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: James Taylor
>            Assignee: Ankit Singhal
>         Attachments: PHOENIX-2143.patch, PHOENIX-2143_v2.patch, 
> PHOENIX-2143_wip.patch, PHOENIX-2143_wip_2.patch
>
>
> Our current SYSTEM.STATS table uses the region name as the last column in the 
> primary key constraint. Instead, we should use the MIN_KEY column (which 
> corresponds to the region start key). The advantage would be that the stats 
> would then be ordered by region start key allowing us to approximate the 
> number of guideposts which would be traversed given the start/stop row of a 
> scan:
> {code}
> SELECT SUM(guide_posts_count) FROM SYSTEM.STATS WHERE min_key > :1 AND 
> min_key < :2
> {code}
> where :1 is the start row and :2 is the stop row of the scan. With an UNNEST 
> operator for ARRAYs, we could get a better approximation.
> As part of the upgrade to the new Phoenix version containing this fix, stats 
> could simply be dropped and they'd be recalculated with the new schema.
> An alternative, even more granular approach would be to *not* use arrays to 
> store the guide posts, but instead store them as individual rows with a 
> schema like this.
> |PHYSICAL_NAME|VARCHAR|
> |COLUMN_FAMILY|VARCHAR|
> |GUIDE_POST_KEY|VARBINARY|
> In this alternative, the maintenance during compaction is higher, though, as 
> you'd need to run a separate query to do the deletion of the old guideposts, 
> followed by a commit of the new guideposts. The other disadvantage (besides 
> requiring multiple queries) is that this couldn't be done transactionally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-2143) Use guidepost bytes instead of region name in stats primary key

Reply via email to