[ 
https://issues.apache.org/jira/browse/SPARK-30616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17138840#comment-17138840
 ] 

Apache Spark commented on SPARK-30616:
--------------------------------------

User 'sap1ens' has created a pull request for this issue:
https://github.com/apache/spark/pull/28852

> Introduce TTL config option for SQL Metadata Cache
> --------------------------------------------------
>
>                 Key: SPARK-30616
>                 URL: https://issues.apache.org/jira/browse/SPARK-30616
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.1.0
>            Reporter: Yaroslav Tkachenko
>            Priority: Major
>
> From 
> [documentation|https://spark.apache.org/docs/2.4.4/sql-data-sources-parquet.html#metadata-refreshing]:
> {quote}Spark SQL caches Parquet metadata for better performance. When Hive 
> metastore Parquet table conversion is enabled, metadata of those converted 
> tables are also cached. If these tables are updated by Hive or other external 
> tools, you need to refresh them manually to ensure consistent metadata.
> {quote}
> Unfortunately simply submitting "REFRESH TABLE"  commands could be very 
> cumbersome. Assuming frequently generated new Parquet files, hundreds of 
> tables and dozens of users querying the data (and expecting up-to-date 
> results), manually refreshing metadata for each table is not an optimal 
> solution. And this is a pretty common use-case for streaming ingestion of 
> data.    
> I propose to introduce a new option in Spark (something like 
> "spark.sql.hive.filesourcePartitionFileCacheTTL") that controls the TTL of 
> this metadata cache. It can be disabled by default (-1), so it doesn't change 
> the existing behaviour. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to