[ 
https://issues.apache.org/jira/browse/PHOENIX-4008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620076#comment-16620076
 ] 

ASF GitHub Bot commented on PHOENIX-4008:
-----------------------------------------

Github user karanmehta93 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/351#discussion_r218662932
  
    --- Diff: 
phoenix-core/src/main/java/org/apache/phoenix/schema/MetaDataClient.java ---
    @@ -1279,6 +1279,7 @@ private long updateStatisticsInternal(PName 
physicalName, PTable logicalTable, M
                 MutationPlan plan = 
compiler.compile(Collections.singletonList(tableRef), null, cfs, null, 
clientTimeStamp);
                 Scan scan = plan.getContext().getScan();
                 scan.setCacheBlocks(false);
    +            scan.readAllVersions();
    --- End diff --
    
    If the table is configured to have multiple versions as part of schema, 
then it is functionally correct to consider all of them anyways (for 
tablesampling as well as update stats). `scan.readAllVersions()` provides 
multiple versions of the cell that are configured at HBase level, these are the 
cells that won't be removed when major compacted. This method helps return all 
versions, which is otherwise defaulted to just 1. It `doesn't` provide deleted 
versions of cell.
    `scan.setRaw()` provides all the versions, even the ones which are deleted. 
I was earlier suggesting @BinShi-SecularBird to use that, but in the scope of 
this Jira, we will not use it, since the problem that we are trying to solve is 
different. Bin will file a separate Jira to address the deleted cells issue.


> UPDATE STATISTIC should collect all versions of cells
> -----------------------------------------------------
>
>                 Key: PHOENIX-4008
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4008
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Samarth Jain
>            Assignee: Bin Shi
>            Priority: Major
>         Attachments: PHOENIX-4008_0918.patch
>
>
> In order to truly measure the size of data when calculating guide posts, 
> UPDATE STATISTIC should taken into account all versions of cells. We should 
> also be setting the max versions on the scan.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to