[ https://issues.apache.org/jira/browse/PHOENIX-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217087#comment-14217087 ]
Andrew Purtell commented on PHOENIX-1453: ----------------------------------------- bq. The best option is that we can keep those aggregate statistics in HFile block/file level so that we can get those stats with min cost instead of scanning on demand because it doesn't work for table with billions/trillion rows. Concur. We can add this in HBase. We'd need compactor changes, storage of these stats in a new HFile metadata block, and a new API for getting aggregates from StoreFile and Store. Am I missing anything? > Collect row counts per region in stats table > -------------------------------------------- > > Key: PHOENIX-1453 > URL: https://issues.apache.org/jira/browse/PHOENIX-1453 > Project: Phoenix > Issue Type: Sub-task > Reporter: James Taylor > Assignee: ramkrishna.s.vasudevan > > We currently collect guideposts per equal chunk, but we should also capture > row counts. Should we have a parallel array with the guideposts that count > rows per guidepost, or is it enough to have a per region count? -- This message was sent by Atlassian JIRA (v6.3.4#6332)