[
https://issues.apache.org/jira/browse/PHOENIX-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Karan Mehta updated PHOENIX-4953:
---------------------------------
Description:
The issue was found during a sanity test run when the count of all rows from
all the guideposts didn't match the actual number of rows in the table.
{{DefaultStatisticsCollector#collectStatistics()}} method iterates over a list
of cells and keeps track of size of KV's. If the size exceeds guideposts width,
it adds an entry to {{GuidePostsInfo}} using
{{GuidePostsInfoBuilder#addGuidePostOnCollection()}} method.
However for the last batch of rows that don't cross the threshold of
GUIDE_POSTS_WIDTH, the code doesn't create any entry for it using the Builder
class. In an ideal case, we would want to cover that scenario by introducing a
small guide post with the corresponding row key and the size of the that
guidepost (since we can persist both the things to SYSTEM.STATS table). This is
also because GUIDE_POSTS_WIDTH is an estimate/best effort for distribution of
data.
was:
The issue was found during a sanity test run when the count of all rows from
all the guideposts didn't match the actual number of rows in the table.
`DefaultStatisticsCollector#collectStatistics()` method iterates over a list of
cells and keeps track of size of KV's. If the size exceeds guideposts width, it
adds an entry to `GuidePostsInfo using
`GuidePostsInfoBuilder`addGuidePostOnCollection()` method.
However for the last batch of rows that don't cross the threshold of
GUIDE_POSTS_WIDTH, the code doesn't create any entry for it using the Builder
class. In an ideal case, we would want to cover that scenario by introducing a
small guide post with the corresponding row key and the size of the that
guidepost (since we can persist both the things to SYSTEM.STATS table). This is
also because GUIDE_POSTS_WIDTH is an estimate/best effort for distribution of
data.
> DefaultStatisticsCollector fails to capture the last guidepost of every region
> ------------------------------------------------------------------------------
>
> Key: PHOENIX-4953
> URL: https://issues.apache.org/jira/browse/PHOENIX-4953
> Project: Phoenix
> Issue Type: Bug
> Reporter: Karan Mehta
> Priority: Major
>
> The issue was found during a sanity test run when the count of all rows from
> all the guideposts didn't match the actual number of rows in the table.
> {{DefaultStatisticsCollector#collectStatistics()}} method iterates over a
> list of cells and keeps track of size of KV's. If the size exceeds guideposts
> width, it adds an entry to {{GuidePostsInfo}} using
> {{GuidePostsInfoBuilder#addGuidePostOnCollection()}} method.
> However for the last batch of rows that don't cross the threshold of
> GUIDE_POSTS_WIDTH, the code doesn't create any entry for it using the Builder
> class. In an ideal case, we would want to cover that scenario by introducing
> a small guide post with the corresponding row key and the size of the that
> guidepost (since we can persist both the things to SYSTEM.STATS table). This
> is also because GUIDE_POSTS_WIDTH is an estimate/best effort for distribution
> of data.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)