[ 
https://issues.apache.org/jira/browse/PHOENIX-4008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16614221#comment-16614221
 ] 

Bin Shi edited comment on PHOENIX-4008 at 9/14/18 1:26 AM:
-----------------------------------------------------------

[~tdsilva]

I checked the code – currently, we don't use raw scan issued in "UPDATE 
STATISTICS" command, so it seems that the stats won't include the delete marker 
and deleted rows. Is my understanding correct?

If we make "include the deleted rows in stats" to be optional and provide 
"UPDATE STATISTICS (INCLUDE DELETED ROWS)", do we still have problem when we 
call "UPDATE STATISTICS (INCLUDE DELETED ROWS)" then call "select * from ... 
TABLESAMPLE(...)" in which table sampling uses the stats which includes the 
deleted rows?

Also it seems that both the scan issued by "UPDATE STATISTICS" command and the 
scans from SELECT query only reads the latest version of a cell, why do we want 
to read all the versions of a cell?

 


was (Author: bin shi):
[~tdsilva]

I checked the code – currently, we don't use raw scan issued in "UPDATE 
STATISTICS" command, so it seems that the stats won't include the delete marker 
and deleted rows. Is my understanding correct?

If we make "include the deleted rows in stats" to be optional and provide 
"UPDATE STATISTICS (INCLUDE DELETED ROWS)", do we still have problem when we 
call "UPDATE STATISTICS (INCLUDE DELETED ROWS)" then call "select * from ... 
TABLESAMPLE(...)" in which table sampling uses the stats which includes the 
deleted rows?

> UPDATE STATISTIC should run raw scan with all versions of cells
> ---------------------------------------------------------------
>
>                 Key: PHOENIX-4008
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4008
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Samarth Jain
>            Assignee: Bin Shi
>            Priority: Major
>
> In order to truly measure the size of data when calculating guide posts, 
> UPDATE STATISTIC should run a raw scan to taken into account all versions of 
> cells. We should also be setting the max versions on the scan.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to