[jira] [Commented] (PHOENIX-1453) Collect row counts per region in stats table

ramkrishna.s.vasudevan (JIRA) Tue, 09 Dec 2014 22:11:49 -0800

    [ 
https://issues.apache.org/jira/browse/PHOENIX-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240708#comment-14240708
 ]


ramkrishna.s.vasudevan commented on PHOENIX-1453:
-------------------------------------------------

Regarding the change of byteCount[] and rowCount[] - how are we going to handle 
the toBytes() in it? I am trying to see this in terms of the problems with the 
current impl if we are going to change the format of it. And that is the reason 
i tried to divide the byteCount and rowCount over the total guide posts. The 
suggestion of adding the rowCount as an array is absolutely right so that we 
don't lose the track of it and so in that sense may be if we track it as 
BIGINT[] itself would be fine as we can have a one to one mapping. Anyway for 
the rowCount we could still track it as BIGINT (as it is a new entity we are 
adding) and convert it into an array on the client side only.  But what about 
the byteCount considering the existing toBytes. Hence I thought of opting with 
the same way for both byteCount and rowCount. 

bq.Also, why is rowCount kept in GuidePostsState instead of just in 
GuidePostsInfo? 
This can be done I think.
bq.I think it can just be local to collectStatistics().
Am not sure on how this can be done easily. The guidePostsMap is going to be 
populated after the first row itself and it know what are the new CFs. So 
anyway from the 2nd row onwards the guidepostinfo is not going to be null for 
all the cells under a CF. So every time when we try to add a guidepostinfo in 
the local tracker then it means we are just going to keep adding for all those 
cells. There is no check in the code to see if we have moved to a new CF except 
that we check from the guidePostsMap if the cell is of existing CF. I tried 
with the local way and found that we were repeating the gps and the increment 
was not proper.



> Collect row counts per region in stats table
> --------------------------------------------
>
>                 Key: PHOENIX-1453
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1453
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: James Taylor
>            Assignee: ramkrishna.s.vasudevan
>         Attachments: Phoenix-1453.patch, Phoenix-1453_1.patch, 
> Phoenix-1453_2.patch, Phoenix-1453_3.patch, Phoenix-1453_7.patch, 
> Phoenix-1453_8.patch
>
>
> We currently collect guideposts per equal chunk, but we should also capture 
> row counts. Should we have a parallel array with the guideposts that count 
> rows per guidepost, or is it enough to have a per region count?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-1453) Collect row counts per region in stats table

Reply via email to