[ https://issues.apache.org/jira/browse/PHOENIX-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240708#comment-14240708 ]
ramkrishna.s.vasudevan commented on PHOENIX-1453: ------------------------------------------------- Regarding the change of byteCount[] and rowCount[] - how are we going to handle the toBytes() in it? I am trying to see this in terms of the problems with the current impl if we are going to change the format of it. And that is the reason i tried to divide the byteCount and rowCount over the total guide posts. The suggestion of adding the rowCount as an array is absolutely right so that we don't lose the track of it and so in that sense may be if we track it as BIGINT[] itself would be fine as we can have a one to one mapping. Anyway for the rowCount we could still track it as BIGINT (as it is a new entity we are adding) and convert it into an array on the client side only. But what about the byteCount considering the existing toBytes. Hence I thought of opting with the same way for both byteCount and rowCount. bq.Also, why is rowCount kept in GuidePostsState instead of just in GuidePostsInfo? This can be done I think. bq.I think it can just be local to collectStatistics(). Am not sure on how this can be done easily. The guidePostsMap is going to be populated after the first row itself and it know what are the new CFs. So anyway from the 2nd row onwards the guidepostinfo is not going to be null for all the cells under a CF. So every time when we try to add a guidepostinfo in the local tracker then it means we are just going to keep adding for all those cells. There is no check in the code to see if we have moved to a new CF except that we check from the guidePostsMap if the cell is of existing CF. I tried with the local way and found that we were repeating the gps and the increment was not proper. > Collect row counts per region in stats table > -------------------------------------------- > > Key: PHOENIX-1453 > URL: https://issues.apache.org/jira/browse/PHOENIX-1453 > Project: Phoenix > Issue Type: Sub-task > Reporter: James Taylor > Assignee: ramkrishna.s.vasudevan > Attachments: Phoenix-1453.patch, Phoenix-1453_1.patch, > Phoenix-1453_2.patch, Phoenix-1453_3.patch, Phoenix-1453_7.patch, > Phoenix-1453_8.patch > > > We currently collect guideposts per equal chunk, but we should also capture > row counts. Should we have a parallel array with the guideposts that count > rows per guidepost, or is it enough to have a per region count? -- This message was sent by Atlassian JIRA (v6.3.4#6332)