Karan Mehta created PHOENIX-4953:
------------------------------------
Summary: DefaultStatisticsCollector fails to capture the last
guidepost of every region
Key: PHOENIX-4953
URL: https://issues.apache.org/jira/browse/PHOENIX-4953
Project: Phoenix
Issue Type: Bug
Reporter: Karan Mehta
The issue was found during a sanity test run when the count of all rows from
all the guideposts didn't match the actual number of rows in the table.
`DefaultStatisticsCollector#collectStatistics()` method iterates over a list of
cells and keeps track of size of KV's. If the size exceeds guideposts width, it
adds an entry to `GuidePostsInfo using
`GuidePostsInfoBuilder`addGuidePostOnCollection()` method.
However for the last batch of rows that don't cross the threshold of
GUIDE_POSTS_WIDTH, the code doesn't create any entry for it using the Builder
class. In an ideal case, we would want to cover that scenario by introducing a
small guide post with the corresponding row key and the size of the that
guidepost (since we can persist both the things to SYSTEM.STATS table). This is
also because GUIDE_POSTS_WIDTH is an estimate/best effort for distribution of
data.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)