[jira] [Comment Edited] (PHOENIX-1453) Collect row counts per region in stats table

James Taylor (JIRA) Sun, 07 Dec 2014 22:57:13 -0800

    [ 
https://issues.apache.org/jira/browse/PHOENIX-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237524#comment-14237524
 ]


James Taylor edited comment on PHOENIX-1453 at 12/8/14 6:55 AM:
----------------------------------------------------------------

[~ramkrishna] - one other thought I had. It'd be fine to keep a single 
ROW_COUNT on each row of the stats table that is the total number of rows for 
that region. We also have the total number of bytes for that region. These two 
together are what we need. We just have to make sure that on PTableStats that 
we either a) end up with two parallel arrays: bytes per guide post and total 
row count or b) we have a kind of per guidepost structure that includes the 
byte count and number of rows. In other words, we don't necessarily need a row 
count for every guidepost - it's just an estimate and it's fine if we maintain 
this per region. On the client-side it's useful to have the parallel arrays as 
we can easily maintain the equivalent parallel array for row count and bytes to 
get the information we need.


was (Author: jamestaylor):
[~ramkrishna] - one other thought I had. It'd be fine to keep a single 
ROW_COUNT on each row of the status table that is the total number of rows for 
that region. We also have the total number of bytes for that region. These two 
together are what we need. We just have to make sure that on PTableStats that 
we either a) end up with two parallel arrays: bytes per guide post and total 
row count or b) we have a kind of per guidepost structure that includes the 
byte count and number of rows. In other words, we don't necessarily need a row 
count for every guidepost - it's just an estimate and it's fine if we maintain 
this per region. On the client-side it's useful to have the parallel arrays as 
we can easily maintain the equivalent parallel array for row count and bytes to 
get the information we need.

> Collect row counts per region in stats table
> --------------------------------------------
>
>                 Key: PHOENIX-1453
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1453
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: James Taylor
>            Assignee: ramkrishna.s.vasudevan
>         Attachments: Phoenix-1453.patch, Phoenix-1453_1.patch, 
> Phoenix-1453_2.patch, Phoenix-1453_3.patch
>
>
> We currently collect guideposts per equal chunk, but we should also capture 
> row counts. Should we have a parallel array with the guideposts that count 
> rows per guidepost, or is it enough to have a per region count?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (PHOENIX-1453) Collect row counts per region in stats table

Reply via email to