[ 
https://issues.apache.org/jira/browse/PHOENIX-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14238312#comment-14238312
 ] 

James Taylor commented on PHOENIX-1453:
---------------------------------------

Yes, correct. For PTableStats (at least the in memory representation), we'd 
have a value of 10 in the parallel long array for the guideposts for that 
region. We can optimize this pretty easily as well if need be down the road. 
For example, we could store the row count per guidepost value a single time in 
the PTableStatsImpl if it's the same in every region. The time it *might* be 
different is if the guidepost depth was modified but an UPDATE STATISTICS 
wasn't run. Then, any region that was major compacted would start to be 
different.

We wouldn't need to store a BIGINT ARRAY with 10 for every corresponding 
guidepost key, though. It'd be good to get that part right initially, though, 
b/c otherwise we have to convert the data later when we change the 
implementation. So it's more important to get that part right initially.

> Collect row counts per region in stats table
> --------------------------------------------
>
>                 Key: PHOENIX-1453
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1453
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: James Taylor
>            Assignee: ramkrishna.s.vasudevan
>         Attachments: Phoenix-1453.patch, Phoenix-1453_1.patch, 
> Phoenix-1453_2.patch, Phoenix-1453_3.patch, Phoenix-1453_7.patch
>
>
> We currently collect guideposts per equal chunk, but we should also capture 
> row counts. Should we have a parallel array with the guideposts that count 
> rows per guidepost, or is it enough to have a per region count?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to