[
https://issues.apache.org/jira/browse/PHOENIX-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14276467#comment-14276467
]
James Taylor commented on PHOENIX-1453:
---------------------------------------
Thanks, [~ramkrishna]. Not a huge deal, but I don't quite understand the
special case you've created. If there are 2 guideposts, and a split occurs,
what is midEndIndex? Assuming midEndIndex=1, I don't think we should be doing a
+1 for per, as per should be 0.5 in this case (since the split will end up with
50% of the guideposts in the right region and 50% of the guideposts in the left
region). If you removed the +1, and just set per = ((double)(midEndIndex)) /
size, would that prevent the special case?
{code}
+ double per = (double)(midEndIndex + 1) / size;
+ long leftRowCount = 0;
+ long rightRowCount = 0;
+ long leftByteCount = 0;
+ long rightByteCount = 0;
+ if (rowCountCell != null) {
+ rowCount =
PLong.INSTANCE.getCodec().decodeLong(rowCountCell.getValueArray(),
+ rowCountCell.getValueOffset(),
SortOrder.getDefault());
+ leftRowCount = (long)(per * rowCount);
+ if (leftRowCount == rowCount) {
+ leftRowCount = (rightRowCount = rowCount / 2);
+ } else {
+ rightRowCount = (long)((1 - per) * rowCount);
+ }
+ }
{code}
> Collect row counts per region in stats table
> --------------------------------------------
>
> Key: PHOENIX-1453
> URL: https://issues.apache.org/jira/browse/PHOENIX-1453
> Project: Phoenix
> Issue Type: Sub-task
> Reporter: James Taylor
> Assignee: ramkrishna.s.vasudevan
> Attachments: Phoenix-1453.patch, Phoenix-1453_1.patch,
> Phoenix-1453_10.patch, Phoenix-1453_13.patch, Phoenix-1453_15.patch,
> Phoenix-1453_17.patch, Phoenix-1453_18.patch, Phoenix-1453_2.patch,
> Phoenix-1453_20.patch, Phoenix-1453_3.patch, Phoenix-1453_7.patch,
> Phoenix-1453_8.patch
>
>
> We currently collect guideposts per equal chunk, but we should also capture
> row counts. Should we have a parallel array with the guideposts that count
> rows per guidepost, or is it enough to have a per region count?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)