[jira] [Comment Edited] (PHOENIX-4008) UPDATE STATISTIC should run raw scan with all versions of cells

Bin Shi (JIRA) Fri, 14 Sep 2018 11:29:43 -0700


    [ 
https://issues.apache.org/jira/browse/PHOENIX-4008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16615202#comment-16615202
 ]


Bin Shi edited comment on PHOENIX-4008 at 9/14/18 6:28 PM:
-----------------------------------------------------------

--> I think this needs to be made configurable per table depending on what end 
users are using stats for. If they collect stats in order to use the table 
sampling feature then delete markers should be excluded. 

Are we saying that if they collect stats to use the table sampling feature then 
they won't use stats for query complexity estimation, query optimization and 
intra-region read parallelization, and vice versa? I really doubt it, maybe I 
missed some context. Even it could be applied to know use cases for now, we 
can't generally have such assumption for a general system like Phoenix.

 


was (Author: bin shi):
--> I think this needs to be made configurable per table depending on what end 
users are using stats for. If they collect stats in order to use the table 
sampling feature then delete markers should be excluded. 

Are we saying that if they collect stats to use the table sampling feature then 
they won't use stats for query complexity estimation, query optimization and 
intra-region read parallelization, and vice versa? Even it could be applied to 
know use cases for now, we can't generally have such assumption for a general 
system like Phoenix.

 

> UPDATE STATISTIC should run raw scan with all versions of cells
> ---------------------------------------------------------------
>
>                 Key: PHOENIX-4008
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4008
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Samarth Jain
>            Assignee: Bin Shi
>            Priority: Major
>
> In order to truly measure the size of data when calculating guide posts, 
> UPDATE STATISTIC should run a raw scan to taken into account all versions of 
> cells. We should also be setting the max versions on the scan.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (PHOENIX-4008) UPDATE STATISTIC should run raw scan with all versions of cells

Reply via email to