[
https://issues.apache.org/jira/browse/SPARK-38258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
gabrywu updated SPARK-38258:
----------------------------
Description:
As we all know, table & column statistics are very important to spark SQL
optimizer, however we have to collect & update them using
{code:java}
analyze table tableName compute statistics{code}
It's a little inconvenient, so why can't we collect & update statistics when a
spark stage runs and finishes?
For example, when a insert overwrite table statement finishes, we can update a
corresponding table statistics using SQL metric. And in following queries,
spark sql optimizer can use these statistics.
So what do you think of it?[~yumwang] , it it reasonable?
was:
As we all know, table & column statistics are very important to spark SQL
optimizer, however we have to collect & update them using
{code:java}
analyze table tableName compute statistics{code}
It's a little inconvenient, so why can't we collect & update statistics when a
spark stage runs and finishes?
For example, when a insert overwrite table statement finishes, we can update a
corresponding table statistics using SQL metric. And in next queries, spark sql
optimizer can use these statistics.
So what do you think of it?[~yumwang]
> [proposal] collect & update statistics automatically when spark SQL is running
> ------------------------------------------------------------------------------
>
> Key: SPARK-38258
> URL: https://issues.apache.org/jira/browse/SPARK-38258
> Project: Spark
> Issue Type: Wish
> Components: Spark Core, SQL
> Affects Versions: 3.0.0, 3.1.0, 3.2.0
> Reporter: gabrywu
> Priority: Minor
>
> As we all know, table & column statistics are very important to spark SQL
> optimizer, however we have to collect & update them using
> {code:java}
> analyze table tableName compute statistics{code}
>
> It's a little inconvenient, so why can't we collect & update statistics when
> a spark stage runs and finishes?
> For example, when a insert overwrite table statement finishes, we can update
> a corresponding table statistics using SQL metric. And in following queries,
> spark sql optimizer can use these statistics.
> So what do you think of it?[~yumwang] , it it reasonable?
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]