[ 
https://issues.apache.org/jira/browse/PHOENIX-4008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16640826#comment-16640826
 ] 

Bin Shi commented on PHOENIX-4008:
----------------------------------

Several thoughts:
 # It depends on how we replicate a table. Some systems use Merkle tree 
underneath for replication, so the process of the replication itself is the 
process of verification.
 # After stats collecting the deleted rows, to use table sampling in 
replication scenario you described, we can ask to make comparison after major 
compaction being triggered, which can be well documented.
 # Accuracy is a must-have for Phoenix Stats, so we should collect the deleted 
rows. The replication scenario is too weak to ask Phoenix Stats to provide 
"UPDATE STATISTICS (EXCLUDING DELETED)" option and bring extra complexity. 

> UPDATE STATISTIC should collect all versions of cells
> -----------------------------------------------------
>
>                 Key: PHOENIX-4008
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4008
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Samarth Jain
>            Assignee: Bin Shi
>            Priority: Major
>             Fix For: 4.15.0, 5.1.0
>
>         Attachments: PHOENIX-4008_0918.patch, PHOENIX-4008_0920.patch, 
> PHONEIX-4008.4.X-HBase-1.2.001.patch, PHONEIX-4008.4.X-HBase-1.3.001.patch, 
> PHONEIX-4008.4.X-HBase-1.4.001.patch
>
>
> In order to truly measure the size of data when calculating guide posts, 
> UPDATE STATISTIC should taken into account all versions of cells. We should 
> also be setting the max versions on the scan.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to