[ https://issues.apache.org/jira/browse/PHOENIX-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15108752#comment-15108752 ]
ASF GitHub Bot commented on PHOENIX-2417: ----------------------------------------- Github user JamesRTaylor commented on a diff in the pull request: https://github.com/apache/phoenix/pull/147#discussion_r50270038 --- Diff: phoenix-protocol/src/main/PTable.proto --- @@ -57,6 +57,8 @@ message PTableStats { optional int64 keyBytesCount = 4; optional int32 guidePostsCount = 5; optional PGuidePosts pGuidePosts = 6; + optional bytes encodedGuidePosts = 7; --- End diff -- I'm a bit confused by this, though. Doesn't the PTable still send multiple PGuidePosts (one per column family)? It's just the PGuidePosts that have changed, no? You already have maxLength on PGuidePosts (which is where it belongs). Why is it needed here? > Compress memory used by row key byte[] of guideposts > ---------------------------------------------------- > > Key: PHOENIX-2417 > URL: https://issues.apache.org/jira/browse/PHOENIX-2417 > Project: Phoenix > Issue Type: Sub-task > Reporter: James Taylor > Assignee: Ankit Singhal > Fix For: 4.7.0 > > Attachments: PHOENIX-2417.patch, PHOENIX-2417_encoder.diff, > PHOENIX-2417_rebased.patch, PHOENIX-2417_v2_wip.patch, StatsUpgrade_wip.patch > > > We've found that smaller guideposts are better in terms of minimizing any > increase in latency for point scans. However, this increases the amount of > memory significantly when caching the guideposts on the client. Guidepost are > equidistant row keys in the form of raw byte[] which are likely to have a > large percentage of their leading bytes in common (as they're stored in > sorted order. We should use a simple compression technique to mitigate this. > I noticed that Apache Parquet has a run length encoding - perhaps we can use > that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)