[ https://issues.apache.org/jira/browse/PHOENIX-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273135#comment-14273135 ]
James Taylor commented on PHOENIX-1453: --------------------------------------- Just a few minor items: - Minor nit: would you mind naming the deserialize method in GuidePostsInfo as deserializeGuidePostsInfo instead of deSerializeGuidePostsInfo? - When calculating the percentage rowCount and byteCount on a split, you need to have two values: leftRowCount, leftByteCount, and rightRowCount, rightByteCount. Use the per variable as you've done to calculate the right ones, but use (1.0 - per) for the other ones. Otherwise, you've mistakenly increased the byte count and row count on a split if say the per variable is 0.7. Would be good to have a test case that would have caught this (i.e. that the overall row count and byte count remain the same after a split). {code} + double per = (double)(midEndIndex + 1) / size; + if (rowCountCell != null) { + rowCount = PLong.INSTANCE.getCodec().decodeLong(rowCountCell.getValueArray(), + rowCountCell.getValueOffset(), SortOrder.getDefault()); + rowCount = (long)(per * rowCount); + } + if (byteSizeCell != null) { + byteSize = PLong.INSTANCE.getCodec().decodeLong(byteSizeCell.getValueArray(), + byteSizeCell.getValueOffset(), SortOrder.getDefault()); + byteSize = (long)(per * byteSize); + } if (midEndIndex > 0) { - GuidePostsInfo lguidePosts = new GuidePostsInfo(byteSize, guidePosts.getGuidePosts().subList(0, midEndIndex)); - tracker.clear(); + GuidePostsInfo lguidePosts = new GuidePostsInfo(byteSize, guidePostsRegionInfo + .getGuidePosts().subList(0, midEndIndex), rowCount); + tracker.clear(); tracker.addGuidePost(cfKey, lguidePosts, byteSize, cell.getTimestamp()); addStats(l.getRegionName(), tracker, cfKey, mutations); } - if (midStartIndex < guidePosts.getGuidePosts().size()) { - GuidePostsInfo rguidePosts = new GuidePostsInfo(byteSize, guidePosts.getGuidePosts().subList(midStartIndex, guidePosts.getGuidePosts().size())); + if (midStartIndex < size) { + GuidePostsInfo rguidePosts = new GuidePostsInfo(byteSize, guidePostsRegionInfo + .getGuidePosts().subList(midStartIndex, size), + rowCount); {code} - In PTableImpl, when the stats are serialized, it'd be a good idea to serialize the byteWidth as before (in addition to the new way you're doing which is definitely superior). The reason is that we allow a new server to be used with an old client. If you serialize the byteWidth in the old way *and* the new way, then old clients will continue to read the expected value. +1 after those. Thanks for all the effort on this one, [~ramkrishna]. > Collect row counts per region in stats table > -------------------------------------------- > > Key: PHOENIX-1453 > URL: https://issues.apache.org/jira/browse/PHOENIX-1453 > Project: Phoenix > Issue Type: Sub-task > Reporter: James Taylor > Assignee: ramkrishna.s.vasudevan > Attachments: Phoenix-1453.patch, Phoenix-1453_1.patch, > Phoenix-1453_10.patch, Phoenix-1453_13.patch, Phoenix-1453_15.patch, > Phoenix-1453_17.patch, Phoenix-1453_2.patch, Phoenix-1453_3.patch, > Phoenix-1453_7.patch, Phoenix-1453_8.patch > > > We currently collect guideposts per equal chunk, but we should also capture > row counts. Should we have a parallel array with the guideposts that count > rows per guidepost, or is it enough to have a per region count? -- This message was sent by Atlassian JIRA (v6.3.4#6332)