Github user BinShi-SecularBird commented on a diff in the pull request:
https://github.com/apache/phoenix/pull/351#discussion_r218903831
--- Diff:
phoenix-core/src/main/java/org/apache/phoenix/schema/MetaDataClient.java ---
@@ -1279,6 +1279,7 @@ private long updateStatisticsInternal(PName
physicalName, PTable logicalTable, M
MutationPlan plan =
compiler.compile(Collections.singletonList(tableRef), null, cfs, null,
clientTimeStamp);
Scan scan = plan.getContext().getScan();
scan.setCacheBlocks(false);
+ scan.readAllVersions();
--- End diff --
BTW, the TABLE SAMPLING(n) means run the query over the n percentage of
rows, and including multiple versions of cells in stats doesn't change the
count of rows, so these two shouldn't have conflicts theoretically. Even we
want to reduce time complexity from O(n) to O(m), where n is the number of rows
in the table and m is the number of guide posts(or parallel scans), the
algorithm of sampling should adhere to this basic fact.
---