[
https://issues.apache.org/jira/browse/SPARK-33729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun resolved SPARK-33729.
-----------------------------------
Fix Version/s: 3.1.0
Resolution: Fixed
Issue resolved by pull request 30699
[https://github.com/apache/spark/pull/30699]
> When refreshing cache, Spark should not use cached plan when recaching data
> ---------------------------------------------------------------------------
>
> Key: SPARK-33729
> URL: https://issues.apache.org/jira/browse/SPARK-33729
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.1.0
> Reporter: Chao Sun
> Assignee: Chao Sun
> Priority: Major
> Fix For: 3.1.0
>
>
> Currently when cache is refreshed, e.g., via "REFRESH TABLE" command, Spark
> will call {{refreshTable}} method within {{CatalogImpl}}.
> {code}
> override def refreshTable(tableName: String): Unit = {
> val tableIdent =
> sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)
> val tableMetadata =
> sessionCatalog.getTempViewOrPermanentTableMetadata(tableIdent)
> val table = sparkSession.table(tableIdent)
> if (tableMetadata.tableType == CatalogTableType.VIEW) {
> // Temp or persistent views: refresh (or invalidate) any metadata/data
> cached
> // in the plan recursively.
> table.queryExecution.analyzed.refresh()
> } else {
> // Non-temp tables: refresh the metadata cache.
> sessionCatalog.refreshTable(tableIdent)
> }
> // If this table is cached as an InMemoryRelation, drop the original
> // cached version and make the new version cached lazily.
> val cache = sparkSession.sharedState.cacheManager.lookupCachedData(table)
> // uncache the logical plan.
> // note this is a no-op for the table itself if it's not cached, but will
> invalidate all
> // caches referencing this table.
> sparkSession.sharedState.cacheManager.uncacheQuery(table, cascade = true)
> if (cache.nonEmpty) {
> // save the cache name and cache level for recreation
> val cacheName = cache.get.cachedRepresentation.cacheBuilder.tableName
> val cacheLevel =
> cache.get.cachedRepresentation.cacheBuilder.storageLevel
> // recache with the same name and cache level.
> sparkSession.sharedState.cacheManager.cacheQuery(table, cacheName,
> cacheLevel)
> }
> }
> {code}
> Note that the {{table}} is created before the table relation cache is
> cleared, and used later in {{cacheQuery}}. This is incorrect since it still
> refers cached table relation which could be stale.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]