[jira] [Commented] (PHOENIX-1453) Collect row counts per region in stats table

James Taylor (JIRA) Sun, 11 Jan 2015 17:19:47 -0800

    [ 
https://issues.apache.org/jira/browse/PHOENIX-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273135#comment-14273135
 ]


James Taylor commented on PHOENIX-1453:
---------------------------------------

Just a few minor items:
- Minor nit: would you mind naming the deserialize method in GuidePostsInfo as 
deserializeGuidePostsInfo instead of deSerializeGuidePostsInfo?
- When calculating the percentage rowCount and byteCount on a split, you need 
to have two values: leftRowCount, leftByteCount, and rightRowCount, 
rightByteCount. Use the per variable as you've done to calculate the right 
ones, but use (1.0 - per) for the other ones. Otherwise, you've mistakenly 
increased the byte count and row count on a split if say the per variable is 
0.7. Would be good to have a test case that would have caught this (i.e. that 
the overall row count and byte count remain the same after a split).
{code}
+                double per = (double)(midEndIndex + 1) / size;
+                if (rowCountCell != null) {
+                    rowCount = 
PLong.INSTANCE.getCodec().decodeLong(rowCountCell.getValueArray(),
+                            rowCountCell.getValueOffset(), 
SortOrder.getDefault());
+                    rowCount = (long)(per * rowCount);
+                }
+                if (byteSizeCell != null) {
+                    byteSize = 
PLong.INSTANCE.getCodec().decodeLong(byteSizeCell.getValueArray(),
+                            byteSizeCell.getValueOffset(), 
SortOrder.getDefault());
+                    byteSize = (long)(per * byteSize);
+                }
                    if (midEndIndex > 0) {
-                       GuidePostsInfo lguidePosts = new 
GuidePostsInfo(byteSize, guidePosts.getGuidePosts().subList(0, midEndIndex));
-                       tracker.clear();
+                       GuidePostsInfo lguidePosts = new 
GuidePostsInfo(byteSize, guidePostsRegionInfo
+                            .getGuidePosts().subList(0, midEndIndex), 
rowCount);
+                    tracker.clear();
                        tracker.addGuidePost(cfKey, lguidePosts, byteSize, 
cell.getTimestamp());
                        addStats(l.getRegionName(), tracker, cfKey, mutations);
                    }
-                   if (midStartIndex < guidePosts.getGuidePosts().size()) {
-                       GuidePostsInfo rguidePosts = new 
GuidePostsInfo(byteSize, guidePosts.getGuidePosts().subList(midStartIndex, 
guidePosts.getGuidePosts().size()));
+                   if (midStartIndex < size) {
+                       GuidePostsInfo rguidePosts = new 
GuidePostsInfo(byteSize, guidePostsRegionInfo
+                            .getGuidePosts().subList(midStartIndex, size),
+                            rowCount);
{code}
- In PTableImpl, when the stats are serialized, it'd be a good idea to 
serialize the byteWidth as before (in addition to the new way you're doing 
which is definitely superior). The reason is that we allow a new server to be 
used with an old client. If you serialize the byteWidth in the old way *and* 
the new way, then old clients will continue to read the expected value.

+1 after those. Thanks for all the effort on this one, [~ramkrishna].



> Collect row counts per region in stats table
> --------------------------------------------
>
>                 Key: PHOENIX-1453
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1453
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: James Taylor
>            Assignee: ramkrishna.s.vasudevan
>         Attachments: Phoenix-1453.patch, Phoenix-1453_1.patch, 
> Phoenix-1453_10.patch, Phoenix-1453_13.patch, Phoenix-1453_15.patch, 
> Phoenix-1453_17.patch, Phoenix-1453_2.patch, Phoenix-1453_3.patch, 
> Phoenix-1453_7.patch, Phoenix-1453_8.patch
>
>
> We currently collect guideposts per equal chunk, but we should also capture 
> row counts. Should we have a parallel array with the guideposts that count 
> rows per guidepost, or is it enough to have a per region count?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-1453) Collect row counts per region in stats table

Reply via email to