[GitHub] phoenix pull request #351: PHOENIX-4008: UPDATE STATISTIC should run raw sca...

BinShi-SecularBird Wed, 19 Sep 2018 10:56:37 -0700

Github user BinShi-SecularBird commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/351#discussion_r218903831
  
    --- Diff: 
phoenix-core/src/main/java/org/apache/phoenix/schema/MetaDataClient.java ---
    @@ -1279,6 +1279,7 @@ private long updateStatisticsInternal(PName 
physicalName, PTable logicalTable, M
                 MutationPlan plan = 
compiler.compile(Collections.singletonList(tableRef), null, cfs, null, 
clientTimeStamp);
                 Scan scan = plan.getContext().getScan();
                 scan.setCacheBlocks(false);
    +            scan.readAllVersions();
    --- End diff --
    
    BTW, the TABLE SAMPLING(n) means run the query over the n percentage of 
rows, and including multiple versions of cells in stats doesn't change the 
count of rows, so these two shouldn't have conflicts theoretically.  Even we 
want to reduce time complexity from O(n) to O(m), where n is the number of rows 
in the table and m is the number of guide posts(or parallel scans), the 
algorithm of sampling should adhere to this basic fact.

---

[GitHub] phoenix pull request #351: PHOENIX-4008: UPDATE STATISTIC should run raw sca...

Reply via email to