[ 
https://issues.apache.org/jira/browse/SPARK-44817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17759229#comment-17759229
 ] 

Rakesh Raushan edited comment on SPARK-44817 at 8/26/23 9:02 AM:
-----------------------------------------------------------------

[~gurwls223]  [~cloud_fan] 

Added SPIP Document.

Link for the document : 
[https://docs.google.com/document/d/1CNPWg_L1fxfB4d2m6xfizRyYRoWS2uPCwTKzhL2fwaQ/edit?usp=sharing]


was (Author: rakson):
Added SPIP Document.

Link for the document : 
https://docs.google.com/document/d/1CNPWg_L1fxfB4d2m6xfizRyYRoWS2uPCwTKzhL2fwaQ/edit?usp=sharing

> Incremental Stats Collection
> ----------------------------
>
>                 Key: SPARK-44817
>                 URL: https://issues.apache.org/jira/browse/SPARK-44817
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.5.0, 4.0.0
>            Reporter: Rakesh Raushan
>            Priority: Major
>
> Spark's Cost Based Optimizer is dependent on the table and column statistics.
> After every execution of DML query, table and column stats are invalidated if 
> auto update of stats collection is not turned on. To keep stats updated we 
> need to run `ANALYZE TABLE COMPUTE STATISTICS` command which is very 
> expensive. It is not feasible to run this command after every DML query.
> Instead, we can incrementally update the stats during each DML query run 
> itself. This way our table and column stats would be fresh at all the time 
> and CBO benefits can be applied. Initially, we can only update table level 
> stats and gradually start updating column level stats as well.
> *Pros:*
> 1. Optimize queries over table which is updated frequently.
> 2. Saves Compute cycles by removing dependency over `ANALYZE TABLE COMPUTE 
> STATISTICS` for updating stats.
> [SPIP Document 
> |https://docs.google.com/document/d/1CNPWg_L1fxfB4d2m6xfizRyYRoWS2uPCwTKzhL2fwaQ/edit?usp=sharing]
>  added



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to