[ 
https://issues.apache.org/jira/browse/PHOENIX-4008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620078#comment-16620078
 ] 

ASF GitHub Bot commented on PHOENIX-4008:
-----------------------------------------

Github user twdsilva commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/351#discussion_r218663493
  
    --- Diff: 
phoenix-core/src/main/java/org/apache/phoenix/schema/MetaDataClient.java ---
    @@ -1279,6 +1279,7 @@ private long updateStatisticsInternal(PName 
physicalName, PTable logicalTable, M
                 MutationPlan plan = 
compiler.compile(Collections.singletonList(tableRef), null, cfs, null, 
clientTimeStamp);
                 Scan scan = plan.getContext().getScan();
                 scan.setCacheBlocks(false);
    +            scan.readAllVersions();
    --- End diff --
    
    scan.readAllVersions() isn't used when we run a query with table sampling. 
If you have 100 versions of a row and run query you will only see the latest 
one, or if an SCN is set you will see the last tow at the timestamp just before 
the SCN. If the guideposts are calculated using all the versions then sampling 
will be incorrect. 


> UPDATE STATISTIC should collect all versions of cells
> -----------------------------------------------------
>
>                 Key: PHOENIX-4008
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4008
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Samarth Jain
>            Assignee: Bin Shi
>            Priority: Major
>         Attachments: PHOENIX-4008_0918.patch
>
>
> In order to truly measure the size of data when calculating guide posts, 
> UPDATE STATISTIC should taken into account all versions of cells. We should 
> also be setting the max versions on the scan.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to