[jira] [Comment Edited] (SPARK-44817) Incremental Stats Collection

Rakesh Raushan (Jira) Wed, 16 Aug 2023 06:09:06 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-44817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754694#comment-17754694
 ]


Rakesh Raushan edited comment on SPARK-44817 at 8/16/23 1:08 PM:
-----------------------------------------------------------------

[~cloud_fan] [~gurwls223] [~maxgekk] [~dongjoon] What are your thoughts over 
this ?
If this looks promising, i can work on raising PR for this.


was (Author: rakson):
[~cloud_fan] [~gurwls223] [~maxgekk] What are your thoughts over this ?
If this looks promising, i can work on raising PR for this.

> Incremental Stats Collection
> ----------------------------
>
>                 Key: SPARK-44817
>                 URL: https://issues.apache.org/jira/browse/SPARK-44817
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 4.0.0
>            Reporter: Rakesh Raushan
>            Priority: Major
>
> Spark's Cost Based Optimizer is dependent on the table and column statistics.
> After every execution of DML query, table and column stats are invalidated if 
> auto update of stats collection is not turned on. To keep stats updated we 
> need to run `ANALYZE TABLE COMPUTE STATISTICS` command which is very 
> expensive. It is not feasible to run this command after every DML query.
> Instead, we can incrementally update the stats during each DML query run 
> itself. This way our table and column stats would be fresh at all the time 
> and CBO benefits can be applied. Initially, we can only update table level 
> stats and gradually start updating column level stats as well.
> *Pros:*
> 1. Optimize queries over table which is updated frequently.
> 2. Saves Compute cycles by removing dependency over `ANALYZE TABLE COMPUTE 
> STATISTICS` for updating stats.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SPARK-44817) Incremental Stats Collection

Reply via email to