[jira] [Comment Edited] (PHOENIX-1453) Collect row counts per region in stats table

Andrew Purtell (JIRA) Tue, 18 Nov 2014 16:06:07 -0800

    [ 
https://issues.apache.org/jira/browse/PHOENIX-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217087#comment-14217087
 ]


Andrew Purtell edited comment on PHOENIX-1453 at 11/19/14 12:05 AM:
--------------------------------------------------------------------

bq. The best option is that we can keep those aggregate statistics in HFile 
block/file level so that we can get those stats with min cost instead of 
scanning on demand because it doesn't work for table with billions/trillion 
rows. 

Concur. We can add this in HBase. We'd need minor compaction code changes to 
track the stats while compacting, storage of these stats in a new HFile 
metadata block, and a new API for getting aggregates from StoreFile and Store. 
Am I missing anything?


was (Author: apurtell):
bq. The best option is that we can keep those aggregate statistics in HFile 
block/file level so that we can get those stats with min cost instead of 
scanning on demand because it doesn't work for table with billions/trillion 
rows. 

Concur. We can add this in HBase. We'd need compactor changes, storage of these 
stats in a new HFile metadata block, and a new API for getting aggregates from 
StoreFile and Store. Am I missing anything?

> Collect row counts per region in stats table
> --------------------------------------------
>
>                 Key: PHOENIX-1453
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1453
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: James Taylor
>            Assignee: ramkrishna.s.vasudevan
>
> We currently collect guideposts per equal chunk, but we should also capture 
> row counts. Should we have a parallel array with the guideposts that count 
> rows per guidepost, or is it enough to have a per region count?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (PHOENIX-1453) Collect row counts per region in stats table

Reply via email to