Karan Mehta created PHOENIX-4953: ------------------------------------ Summary: DefaultStatisticsCollector fails to capture the last guidepost of every region Key: PHOENIX-4953 URL: https://issues.apache.org/jira/browse/PHOENIX-4953 Project: Phoenix Issue Type: Bug Reporter: Karan Mehta
The issue was found during a sanity test run when the count of all rows from all the guideposts didn't match the actual number of rows in the table. `DefaultStatisticsCollector#collectStatistics()` method iterates over a list of cells and keeps track of size of KV's. If the size exceeds guideposts width, it adds an entry to `GuidePostsInfo using `GuidePostsInfoBuilder`addGuidePostOnCollection()` method. However for the last batch of rows that don't cross the threshold of GUIDE_POSTS_WIDTH, the code doesn't create any entry for it using the Builder class. In an ideal case, we would want to cover that scenario by introducing a small guide post with the corresponding row key and the size of the that guidepost (since we can persist both the things to SYSTEM.STATS table). This is also because GUIDE_POSTS_WIDTH is an estimate/best effort for distribution of data. -- This message was sent by Atlassian JIRA (v7.6.3#76005)