[
https://issues.apache.org/jira/browse/PHOENIX-4008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16615202#comment-16615202
]
Bin Shi edited comment on PHOENIX-4008 at 9/14/18 6:28 PM:
-----------------------------------------------------------
--> I think this needs to be made configurable per table depending on what end
users are using stats for. If they collect stats in order to use the table
sampling feature then delete markers should be excluded.
Are we saying that if they collect stats to use the table sampling feature then
they won't use stats for query complexity estimation, query optimization and
intra-region read parallelization, and vice versa? I really doubt it, maybe I
missed some context. Even it could be applied to know use cases for now, we
can't generally have such assumption for a general system like Phoenix.
was (Author: bin shi):
--> I think this needs to be made configurable per table depending on what end
users are using stats for. If they collect stats in order to use the table
sampling feature then delete markers should be excluded.
Are we saying that if they collect stats to use the table sampling feature then
they won't use stats for query complexity estimation, query optimization and
intra-region read parallelization, and vice versa? Even it could be applied to
know use cases for now, we can't generally have such assumption for a general
system like Phoenix.
> UPDATE STATISTIC should run raw scan with all versions of cells
> ---------------------------------------------------------------
>
> Key: PHOENIX-4008
> URL: https://issues.apache.org/jira/browse/PHOENIX-4008
> Project: Phoenix
> Issue Type: Bug
> Reporter: Samarth Jain
> Assignee: Bin Shi
> Priority: Major
>
> In order to truly measure the size of data when calculating guide posts,
> UPDATE STATISTIC should run a raw scan to taken into account all versions of
> cells. We should also be setting the max versions on the scan.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)