[ https://issues.apache.org/jira/browse/PHOENIX-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Karan Mehta updated PHOENIX-4953: --------------------------------- Description: The issue was found during a sanity test run when the count of all rows from all the guideposts didn't match the actual number of rows in the table. {{DefaultStatisticsCollector#collectStatistics()}} method iterates over a list of cells and keeps track of size of KV's. If the size exceeds guideposts width, it adds an entry to {{GuidePostsInfo}} using {{GuidePostsInfoBuilder#addGuidePostOnCollection()}} method. However for the last batch of rows that don't cross the threshold of GUIDE_POSTS_WIDTH, the code doesn't create any entry for it using the Builder class. In an ideal case, we would want to cover that scenario by introducing a small guide post with the corresponding row key and the size of the that guidepost (since we can persist both the things to SYSTEM.STATS table). This is also because GUIDE_POSTS_WIDTH is an estimate/best effort for distribution of data. was: The issue was found during a sanity test run when the count of all rows from all the guideposts didn't match the actual number of rows in the table. `DefaultStatisticsCollector#collectStatistics()` method iterates over a list of cells and keeps track of size of KV's. If the size exceeds guideposts width, it adds an entry to `GuidePostsInfo using `GuidePostsInfoBuilder`addGuidePostOnCollection()` method. However for the last batch of rows that don't cross the threshold of GUIDE_POSTS_WIDTH, the code doesn't create any entry for it using the Builder class. In an ideal case, we would want to cover that scenario by introducing a small guide post with the corresponding row key and the size of the that guidepost (since we can persist both the things to SYSTEM.STATS table). This is also because GUIDE_POSTS_WIDTH is an estimate/best effort for distribution of data. > DefaultStatisticsCollector fails to capture the last guidepost of every region > ------------------------------------------------------------------------------ > > Key: PHOENIX-4953 > URL: https://issues.apache.org/jira/browse/PHOENIX-4953 > Project: Phoenix > Issue Type: Bug > Reporter: Karan Mehta > Priority: Major > > The issue was found during a sanity test run when the count of all rows from > all the guideposts didn't match the actual number of rows in the table. > {{DefaultStatisticsCollector#collectStatistics()}} method iterates over a > list of cells and keeps track of size of KV's. If the size exceeds guideposts > width, it adds an entry to {{GuidePostsInfo}} using > {{GuidePostsInfoBuilder#addGuidePostOnCollection()}} method. > However for the last batch of rows that don't cross the threshold of > GUIDE_POSTS_WIDTH, the code doesn't create any entry for it using the Builder > class. In an ideal case, we would want to cover that scenario by introducing > a small guide post with the corresponding row key and the size of the that > guidepost (since we can persist both the things to SYSTEM.STATS table). This > is also because GUIDE_POSTS_WIDTH is an estimate/best effort for distribution > of data. -- This message was sent by Atlassian JIRA (v7.6.3#76005)