[jira] [Commented] (PHOENIX-2417) Compress memory used by row key byte[] of guideposts

ASF GitHub Bot (JIRA) Sat, 16 Jan 2016 10:03:11 -0800

    [ 
https://issues.apache.org/jira/browse/PHOENIX-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15103291#comment-15103291
 ]


ASF GitHub Bot commented on PHOENIX-2417:
-----------------------------------------

Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/147#discussion_r49935021
  
    --- Diff: phoenix-protocol/src/main/PTable.proto ---
    @@ -52,11 +52,12 @@ message PColumn {
     
     message PTableStats {
       required bytes key = 1;
    -  repeated bytes values = 2;
    +  optional bytes guidePosts = 2;
    --- End diff --
    
    There's a pretty big backward compatibility issue due to PHOENIX-2143 and 
this one. The case you'll need to make work is an old pre 4.7.0 client that's 
running against a new 4.7.0 server. The client will expect the stats to be in 
the original format. In the following call:
    
        public void getTable(RpcController controller, GetTableRequest request,
                RpcCallback<MetaDataResponse> done) {
    
    You'll need to pass request.getClientVersion() through doGetTable(), into 
getTable() and finally into StatisticsUtil.readStatistics(). You should 
preserve the old code (we can dump it when we do a major release), and use that 
code path if the stats have not been regenerated yet. You can detect this based 
on the existence of the GUIDE_POSTS key value (which you'll want to project 
into the scan for the new code for this b/w compatibility case). If the stats 
have been regenerated, there'd be two cases: the client is pre 4.7.0 in which 
case you'd want to use the new code but put the data in the old format, or the 
client is 4.7.0 or above in which case your existing code is fine.
    
    With PHOENIX-2143, when compaction runs, we'll generate stats in the new 
format. It's possible that the SYSTEM.STATS table hasn't been updated yet (as 
this gets triggered when a new 4.7.0 client connects to the server which may 
not yet have happened). We'd need to issue the previous Delete marker based on 
the old row key structure to ensure that the stats for the region are deleted. 
We wouldn't want to issue the query that does the range delete in this case 
because it might delete rows across multiple regions (ugh). So we'd need to 
know if the schema upgrade has been done yet when compaction runs. We could 
detect this by querying the SYSTEM.CATALOG table directly or by using the 
MetaDataProtocol.getTable() call and pulling over the PTable and then 
conditionally do the delete the old way versus the new way.
    
    WDYT, @ankitsinghal? A more radical alternative would be to call this 
release 5.0. Users could still upgrade the server and client as with a minor 
release, but they'd need to truncate the SYSTEM.STATS table manually before 
upgrading the server. In that case, I think it'd be acceptable to return an 
empty guidepost for the protobuf values field (as essentially stats would be 
disabled for older clients running against the newer server).



> Compress memory used by row key byte[] of guideposts
> ----------------------------------------------------
>
>                 Key: PHOENIX-2417
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2417
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: James Taylor
>            Assignee: Ankit Singhal
>             Fix For: 4.7.0
>
>         Attachments: PHOENIX-2417.patch, PHOENIX-2417_encoder.diff, 
> PHOENIX-2417_v2_wip.patch
>
>
> We've found that smaller guideposts are better in terms of minimizing any 
> increase in latency for point scans. However, this increases the amount of 
> memory significantly when caching the guideposts on the client. Guidepost are 
> equidistant row keys in the form of raw byte[] which are likely to have a 
> large percentage of their leading bytes in common (as they're stored in 
> sorted order. We should use a simple compression technique to mitigate this. 
> I noticed that Apache Parquet has a run length encoding - perhaps we can use 
> that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-2417) Compress memory used by row key byte[] of guideposts

Reply via email to