[jira] [Comment Edited] (PHOENIX-4008) UPDATE STATISTIC should run raw scan with all versions of cells

Bin Shi (JIRA) Thu, 13 Sep 2018 17:59:22 -0700


    [ 
https://issues.apache.org/jira/browse/PHOENIX-4008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16614146#comment-16614146
 ]


Bin Shi edited comment on PHOENIX-4008 at 9/14/18 12:58 AM:
------------------------------------------------------------

[~tdsilva]

If my understanding on what you said is correct, currently when we collect 
stats, it includes the deleted rows, which makes table sampling relying on 
stats to be inaccurate. The changes you suggested to make are: 1. By default, 
we don't count deleted rows in stats collected; 2. Make "include the deleted 
rows in the stats" to be optional, and use "UPDATE STATISTICS (INCLUDE DELETED 
ROWS) " explicitly to indicate that the stats should include the deleted rows. 


was (Author: bin shi):
[~tdsilva]

If my understanding on what you said is correct, currently when we collect 
stats, it includes the deleted rows, which makes table sampling relying on 
stats to be inaccurate. The changes you suggested to make are: 1. By default, 
we don't count deleted rows in stats collected; 2. Make "include the deleted 
rows in the stats" to be optional, and use "UPDATE STATISTICS (INCLUDE DELETED 
ROWS) " explicitly to indicate that the stats should include the deleted rows. 
Shall we use a global configuration to indicate whether or not the stats should 
include the deleted rows when collecting stats in major compaction or running 
jobs to collect stats from the Snapshot?

> UPDATE STATISTIC should run raw scan with all versions of cells
> ---------------------------------------------------------------
>
>                 Key: PHOENIX-4008
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4008
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Samarth Jain
>            Assignee: Bin Shi
>            Priority: Major
>
> In order to truly measure the size of data when calculating guide posts, 
> UPDATE STATISTIC should run a raw scan to taken into account all versions of 
> cells. We should also be setting the max versions on the scan.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (PHOENIX-4008) UPDATE STATISTIC should run raw scan with all versions of cells

Reply via email to