[ https://issues.apache.org/jira/browse/PHOENIX-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16642157#comment-16642157 ]
Karan Mehta commented on PHOENIX-4953: -------------------------------------- This can also help us determine if guide posts are missing for a region completely v/s if the data in a region is less than GPW. > DefaultStatisticsCollector fails to capture the last guidepost of every region > ------------------------------------------------------------------------------ > > Key: PHOENIX-4953 > URL: https://issues.apache.org/jira/browse/PHOENIX-4953 > Project: Phoenix > Issue Type: Bug > Reporter: Karan Mehta > Priority: Major > > The issue was found during a sanity test run when the count of all rows from > all the guideposts didn't match the actual number of rows in the table. > {{DefaultStatisticsCollector#collectStatistics()}} method iterates over a > list of cells and keeps track of size of KV's. If the size exceeds guideposts > width, it adds an entry to {{GuidePostsInfo}} using > {{GuidePostsInfoBuilder#addGuidePostOnCollection()}} method. > However for the last batch of rows that don't cross the threshold of > GUIDE_POSTS_WIDTH, the code doesn't create any entry for it using the Builder > class. In an ideal case, we would want to cover that scenario by introducing > a small guide post with the corresponding row key and the size of the that > guidepost (since we can persist both the things to SYSTEM.STATS table). This > is also because GUIDE_POSTS_WIDTH is an estimate/best effort for distribution > of data. -- This message was sent by Atlassian JIRA (v7.6.3#76005)