Github user karanmehta93 commented on a diff in the pull request:
https://github.com/apache/phoenix/pull/351#discussion_r218662932
--- Diff:
phoenix-core/src/main/java/org/apache/phoenix/schema/MetaDataClient.java ---
@@ -1279,6 +1279,7 @@ private long updateStatisticsInternal(PName
physicalName, PTable logicalTable, M
MutationPlan plan =
compiler.compile(Collections.singletonList(tableRef), null, cfs, null,
clientTimeStamp);
Scan scan = plan.getContext().getScan();
scan.setCacheBlocks(false);
+ scan.readAllVersions();
--- End diff --
If the table is configured to have multiple versions as part of schema,
then it is functionally correct to consider all of them anyways (for
tablesampling as well as update stats). `scan.readAllVersions()` provides
multiple versions of the cell that are configured at HBase level, these are the
cells that won't be removed when major compacted. This method helps return all
versions, which is otherwise defaulted to just 1. It `doesn't` provide deleted
versions of cell.
`scan.setRaw()` provides all the versions, even the ones which are deleted.
I was earlier suggesting @BinShi-SecularBird to use that, but in the scope of
this Jira, we will not use it, since the problem that we are trying to solve is
different. Bin will file a separate Jira to address the deleted cells issue.
---