[ 
https://issues.apache.org/jira/browse/PHOENIX-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karan Mehta updated PHOENIX-4953:
---------------------------------
    Description: 
The issue was found during a sanity test run when the count of all rows from 
all the guideposts didn't match the actual number of rows in the table. 

{{DefaultStatisticsCollector#collectStatistics()}} method iterates over a list 
of cells and keeps track of size of KV's. If the size exceeds guideposts width, 
it adds an entry to {{GuidePostsInfo}} using 
{{GuidePostsInfoBuilder#addGuidePostOnCollection()}} method. 

However for the last batch of rows that don't cross the threshold of 
GUIDE_POSTS_WIDTH, the code doesn't create any entry for it using the Builder 
class. In an ideal case, we would want to cover that scenario by introducing a 
small guide post with the corresponding row key and the size of the that 
guidepost (since we can persist both the things to SYSTEM.STATS table). This is 
also because GUIDE_POSTS_WIDTH is an estimate/best effort for distribution of 
data. 

  was:
The issue was found during a sanity test run when the count of all rows from 
all the guideposts didn't match the actual number of rows in the table. 

`DefaultStatisticsCollector#collectStatistics()` method iterates over a list of 
cells and keeps track of size of KV's. If the size exceeds guideposts width, it 
adds an entry to `GuidePostsInfo using 
`GuidePostsInfoBuilder`addGuidePostOnCollection()` method. 

However for the last batch of rows that don't cross the threshold of 
GUIDE_POSTS_WIDTH, the code doesn't create any entry for it using the Builder 
class. In an ideal case, we would want to cover that scenario by introducing a 
small guide post with the corresponding row key and the size of the that 
guidepost (since we can persist both the things to SYSTEM.STATS table). This is 
also because GUIDE_POSTS_WIDTH is an estimate/best effort for distribution of 
data. 


> DefaultStatisticsCollector fails to capture the last guidepost of every region
> ------------------------------------------------------------------------------
>
>                 Key: PHOENIX-4953
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4953
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Karan Mehta
>            Priority: Major
>
> The issue was found during a sanity test run when the count of all rows from 
> all the guideposts didn't match the actual number of rows in the table. 
> {{DefaultStatisticsCollector#collectStatistics()}} method iterates over a 
> list of cells and keeps track of size of KV's. If the size exceeds guideposts 
> width, it adds an entry to {{GuidePostsInfo}} using 
> {{GuidePostsInfoBuilder#addGuidePostOnCollection()}} method. 
> However for the last batch of rows that don't cross the threshold of 
> GUIDE_POSTS_WIDTH, the code doesn't create any entry for it using the Builder 
> class. In an ideal case, we would want to cover that scenario by introducing 
> a small guide post with the corresponding row key and the size of the that 
> guidepost (since we can persist both the things to SYSTEM.STATS table). This 
> is also because GUIDE_POSTS_WIDTH is an estimate/best effort for distribution 
> of data. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to