[ https://issues.apache.org/jira/browse/PHOENIX-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15103291#comment-15103291 ]
ASF GitHub Bot commented on PHOENIX-2417: ----------------------------------------- Github user JamesRTaylor commented on a diff in the pull request: https://github.com/apache/phoenix/pull/147#discussion_r49935021 --- Diff: phoenix-protocol/src/main/PTable.proto --- @@ -52,11 +52,12 @@ message PColumn { message PTableStats { required bytes key = 1; - repeated bytes values = 2; + optional bytes guidePosts = 2; --- End diff -- There's a pretty big backward compatibility issue due to PHOENIX-2143 and this one. The case you'll need to make work is an old pre 4.7.0 client that's running against a new 4.7.0 server. The client will expect the stats to be in the original format. In the following call: public void getTable(RpcController controller, GetTableRequest request, RpcCallback<MetaDataResponse> done) { You'll need to pass request.getClientVersion() through doGetTable(), into getTable() and finally into StatisticsUtil.readStatistics(). You should preserve the old code (we can dump it when we do a major release), and use that code path if the stats have not been regenerated yet. You can detect this based on the existence of the GUIDE_POSTS key value (which you'll want to project into the scan for the new code for this b/w compatibility case). If the stats have been regenerated, there'd be two cases: the client is pre 4.7.0 in which case you'd want to use the new code but put the data in the old format, or the client is 4.7.0 or above in which case your existing code is fine. With PHOENIX-2143, when compaction runs, we'll generate stats in the new format. It's possible that the SYSTEM.STATS table hasn't been updated yet (as this gets triggered when a new 4.7.0 client connects to the server which may not yet have happened). We'd need to issue the previous Delete marker based on the old row key structure to ensure that the stats for the region are deleted. We wouldn't want to issue the query that does the range delete in this case because it might delete rows across multiple regions (ugh). So we'd need to know if the schema upgrade has been done yet when compaction runs. We could detect this by querying the SYSTEM.CATALOG table directly or by using the MetaDataProtocol.getTable() call and pulling over the PTable and then conditionally do the delete the old way versus the new way. WDYT, @ankitsinghal? A more radical alternative would be to call this release 5.0. Users could still upgrade the server and client as with a minor release, but they'd need to truncate the SYSTEM.STATS table manually before upgrading the server. In that case, I think it'd be acceptable to return an empty guidepost for the protobuf values field (as essentially stats would be disabled for older clients running against the newer server). > Compress memory used by row key byte[] of guideposts > ---------------------------------------------------- > > Key: PHOENIX-2417 > URL: https://issues.apache.org/jira/browse/PHOENIX-2417 > Project: Phoenix > Issue Type: Sub-task > Reporter: James Taylor > Assignee: Ankit Singhal > Fix For: 4.7.0 > > Attachments: PHOENIX-2417.patch, PHOENIX-2417_encoder.diff, > PHOENIX-2417_v2_wip.patch > > > We've found that smaller guideposts are better in terms of minimizing any > increase in latency for point scans. However, this increases the amount of > memory significantly when caching the guideposts on the client. Guidepost are > equidistant row keys in the form of raw byte[] which are likely to have a > large percentage of their leading bytes in common (as they're stored in > sorted order. We should use a simple compression technique to mitigate this. > I noticed that Apache Parquet has a run length encoding - perhaps we can use > that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)