[
https://issues.apache.org/jira/browse/SPARK-44817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17759229#comment-17759229
]
Rakesh Raushan edited comment on SPARK-44817 at 8/26/23 9:02 AM:
-----------------------------------------------------------------
[~gurwls223] [~cloud_fan]
Added SPIP Document.
Link for the document :
[https://docs.google.com/document/d/1CNPWg_L1fxfB4d2m6xfizRyYRoWS2uPCwTKzhL2fwaQ/edit?usp=sharing]
was (Author: rakson):
Added SPIP Document.
Link for the document :
https://docs.google.com/document/d/1CNPWg_L1fxfB4d2m6xfizRyYRoWS2uPCwTKzhL2fwaQ/edit?usp=sharing
> Incremental Stats Collection
> ----------------------------
>
> Key: SPARK-44817
> URL: https://issues.apache.org/jira/browse/SPARK-44817
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.5.0, 4.0.0
> Reporter: Rakesh Raushan
> Priority: Major
>
> Spark's Cost Based Optimizer is dependent on the table and column statistics.
> After every execution of DML query, table and column stats are invalidated if
> auto update of stats collection is not turned on. To keep stats updated we
> need to run `ANALYZE TABLE COMPUTE STATISTICS` command which is very
> expensive. It is not feasible to run this command after every DML query.
> Instead, we can incrementally update the stats during each DML query run
> itself. This way our table and column stats would be fresh at all the time
> and CBO benefits can be applied. Initially, we can only update table level
> stats and gradually start updating column level stats as well.
> *Pros:*
> 1. Optimize queries over table which is updated frequently.
> 2. Saves Compute cycles by removing dependency over `ANALYZE TABLE COMPUTE
> STATISTICS` for updating stats.
> [SPIP Document
> |https://docs.google.com/document/d/1CNPWg_L1fxfB4d2m6xfizRyYRoWS2uPCwTKzhL2fwaQ/edit?usp=sharing]
> added
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]