[ 
https://issues.apache.org/jira/browse/PHOENIX-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216641#comment-14216641
 ] 

Lars Hofhansl commented on PHOENIX-1453:
----------------------------------------

One is to estimate the number of rows in the table without the need to scan it 
in its entirety.
Also in HBase there is a cost per row and cost per byte. Having row counts 
would allow us to make better decisions in parallelization. Might be good to 
even count the number of columns.

The risk here is that this would require new comparisons per Cell to detect 
when a new row/column starts. As we've seen before this add measurable CPU to 
compactions.


> Collect row counts per region in stats table
> --------------------------------------------
>
>                 Key: PHOENIX-1453
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1453
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: James Taylor
>            Assignee: ramkrishna.s.vasudevan
>
> We currently collect guideposts per equal chunk, but we should also capture 
> row counts. Should we have a parallel array with the guideposts that count 
> rows per guidepost, or is it enough to have a per region count?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to