[
https://issues.apache.org/jira/browse/PHOENIX-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237524#comment-14237524
]
James Taylor edited comment on PHOENIX-1453 at 12/8/14 6:55 AM:
----------------------------------------------------------------
[~ramkrishna] - one other thought I had. It'd be fine to keep a single
ROW_COUNT on each row of the stats table that is the total number of rows for
that region. We also have the total number of bytes for that region. These two
together are what we need. We just have to make sure that on PTableStats that
we either a) end up with two parallel arrays: bytes per guide post and total
row count or b) we have a kind of per guidepost structure that includes the
byte count and number of rows. In other words, we don't necessarily need a row
count for every guidepost - it's just an estimate and it's fine if we maintain
this per region. On the client-side it's useful to have the parallel arrays as
we can easily maintain the equivalent parallel array for row count and bytes to
get the information we need.
was (Author: jamestaylor):
[~ramkrishna] - one other thought I had. It'd be fine to keep a single
ROW_COUNT on each row of the status table that is the total number of rows for
that region. We also have the total number of bytes for that region. These two
together are what we need. We just have to make sure that on PTableStats that
we either a) end up with two parallel arrays: bytes per guide post and total
row count or b) we have a kind of per guidepost structure that includes the
byte count and number of rows. In other words, we don't necessarily need a row
count for every guidepost - it's just an estimate and it's fine if we maintain
this per region. On the client-side it's useful to have the parallel arrays as
we can easily maintain the equivalent parallel array for row count and bytes to
get the information we need.
> Collect row counts per region in stats table
> --------------------------------------------
>
> Key: PHOENIX-1453
> URL: https://issues.apache.org/jira/browse/PHOENIX-1453
> Project: Phoenix
> Issue Type: Sub-task
> Reporter: James Taylor
> Assignee: ramkrishna.s.vasudevan
> Attachments: Phoenix-1453.patch, Phoenix-1453_1.patch,
> Phoenix-1453_2.patch, Phoenix-1453_3.patch
>
>
> We currently collect guideposts per equal chunk, but we should also capture
> row counts. Should we have a parallel array with the guideposts that count
> rows per guidepost, or is it enough to have a per region count?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)