[
https://issues.apache.org/jira/browse/PHOENIX-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216787#comment-14216787
]
Jeffrey Zhong commented on PHOENIX-1453:
----------------------------------------
The best option is that we can keep those aggregate statistics in HFile
block/file level so that we can get those stats with min cost instead of
scanning on demand because it doesn't work for table with billions/trillion
rows. In live cases, we found select count(*) against a big table cause all RSs
cpu shoot up 100%. I haven't digger the issue yet but seems we need some
throttling.
> Collect row counts per region in stats table
> --------------------------------------------
>
> Key: PHOENIX-1453
> URL: https://issues.apache.org/jira/browse/PHOENIX-1453
> Project: Phoenix
> Issue Type: Sub-task
> Reporter: James Taylor
> Assignee: ramkrishna.s.vasudevan
>
> We currently collect guideposts per equal chunk, but we should also capture
> row counts. Should we have a parallel array with the guideposts that count
> rows per guidepost, or is it enough to have a per region count?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)