[ 
https://issues.apache.org/jira/browse/PHOENIX-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15103353#comment-15103353
 ] 

ASF GitHub Bot commented on PHOENIX-2417:
-----------------------------------------

Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/147#discussion_r49935621
  
    --- Diff: phoenix-protocol/src/main/PTable.proto ---
    @@ -52,11 +52,12 @@ message PColumn {
     
     message PTableStats {
       required bytes key = 1;
    -  repeated bytes values = 2;
    +  optional bytes guidePosts = 2;
    --- End diff --
    
    Let's do something in-the-middle. We can stick with the plan that this is 
still 4.7.0 release, but we can do the above in MetaDataRegionObserver to 
ensure that the SYSTEM.STATS table is truncated. Here what needs to be done:
    * conditionally truncate SYSTEM.STATS table in 
MetaDataRegionObserver.postOpen() based on checkAndPut
    * keep values field at protobuf position 2 and return an empty PGuidePosts 
for that field. We'll document that stats are essentially disabled for an old 
client once you upgrade your server (but nothing will break).


> Compress memory used by row key byte[] of guideposts
> ----------------------------------------------------
>
>                 Key: PHOENIX-2417
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2417
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: James Taylor
>            Assignee: Ankit Singhal
>             Fix For: 4.7.0
>
>         Attachments: PHOENIX-2417.patch, PHOENIX-2417_encoder.diff, 
> PHOENIX-2417_v2_wip.patch
>
>
> We've found that smaller guideposts are better in terms of minimizing any 
> increase in latency for point scans. However, this increases the amount of 
> memory significantly when caching the guideposts on the client. Guidepost are 
> equidistant row keys in the form of raw byte[] which are likely to have a 
> large percentage of their leading bytes in common (as they're stored in 
> sorted order. We should use a simple compression technique to mitigate this. 
> I noticed that Apache Parquet has a run length encoding - perhaps we can use 
> that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to