[ https://issues.apache.org/jira/browse/PHOENIX-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216641#comment-14216641 ]
Lars Hofhansl commented on PHOENIX-1453: ---------------------------------------- One is to estimate the number of rows in the table without the need to scan it in its entirety. Also in HBase there is a cost per row and cost per byte. Having row counts would allow us to make better decisions in parallelization. Might be good to even count the number of columns. The risk here is that this would require new comparisons per Cell to detect when a new row/column starts. As we've seen before this add measurable CPU to compactions. > Collect row counts per region in stats table > -------------------------------------------- > > Key: PHOENIX-1453 > URL: https://issues.apache.org/jira/browse/PHOENIX-1453 > Project: Phoenix > Issue Type: Sub-task > Reporter: James Taylor > Assignee: ramkrishna.s.vasudevan > > We currently collect guideposts per equal chunk, but we should also capture > row counts. Should we have a parallel array with the guideposts that count > rows per guidepost, or is it enough to have a per region count? -- This message was sent by Atlassian JIRA (v6.3.4#6332)