[GitHub] phoenix pull request #351: PHOENIX-4008: UPDATE STATISTIC should run raw sca...

karanmehta93 Tue, 18 Sep 2018 21:10:10 -0700

Github user karanmehta93 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/351#discussion_r218662932
  
    --- Diff: 
phoenix-core/src/main/java/org/apache/phoenix/schema/MetaDataClient.java ---
    @@ -1279,6 +1279,7 @@ private long updateStatisticsInternal(PName 
physicalName, PTable logicalTable, M
                 MutationPlan plan = 
compiler.compile(Collections.singletonList(tableRef), null, cfs, null, 
clientTimeStamp);
                 Scan scan = plan.getContext().getScan();
                 scan.setCacheBlocks(false);
    +            scan.readAllVersions();
    --- End diff --
    
    If the table is configured to have multiple versions as part of schema, 
then it is functionally correct to consider all of them anyways (for 
tablesampling as well as update stats). `scan.readAllVersions()` provides 
multiple versions of the cell that are configured at HBase level, these are the 
cells that won't be removed when major compacted. This method helps return all 
versions, which is otherwise defaulted to just 1. It `doesn't` provide deleted 
versions of cell.
    `scan.setRaw()` provides all the versions, even the ones which are deleted. 
I was earlier suggesting @BinShi-SecularBird to use that, but in the scope of 
this Jira, we will not use it, since the problem that we are trying to solve is 
different. Bin will file a separate Jira to address the deleted cells issue.

---

[GitHub] phoenix pull request #351: PHOENIX-4008: UPDATE STATISTIC should run raw sca...

Reply via email to