[ https://issues.apache.org/jira/browse/PHOENIX-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453267#comment-16453267 ]
James Taylor commented on PHOENIX-4674: --------------------------------------- Thanks for the test, [~abhishek.chouhan]. I tweaked it slightly - the current behavior is working as designed. The statistics reported are meant to be an upper bound of the amount of data scanned. In this case, statistics have been collected, but we know we have less than a guideposts width. So we use the guideposts width as the bytes scanned and estimate the row count based on our row width estimate. We could use 0 as the estimate of bytes/rows scanned, but the disadvantage would be if a very large guidepost width is configured, there actually may be a sizeable amount of data to scan (and the user would be given no indication of that). > Incorrect stats if data size is less than guidepost width > --------------------------------------------------------- > > Key: PHOENIX-4674 > URL: https://issues.apache.org/jira/browse/PHOENIX-4674 > Project: Phoenix > Issue Type: Bug > Reporter: Abhishek Singh Chouhan > Assignee: Abhishek Singh Chouhan > Priority: Major > Attachments: PHOENIX-4674.patch > > > For a small table, lets say with a single region < guidepost width, the stats > after running update statistics can be way off. This is because we get an > empty guidepost for the region and in BaseResultIterators we end up > estimating the #rows as guidepostwidth/estimated row size of the table. For a > table having <100 rows and guidepost width size of 100 mb, if the estimated > row size is 100 bytes we end up estimating a million rows. -- This message was sent by Atlassian JIRA (v7.6.3#76005)