[
https://issues.apache.org/jira/browse/SPARK-54025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18032869#comment-18032869
]
Gengliang Wang commented on SPARK-54025:
----------------------------------------
cc [~vli-databricks]
> Support recaching when a table is written via a different table
> implementation (V1 or V2)
> -----------------------------------------------------------------------------------------
>
> Key: SPARK-54025
> URL: https://issues.apache.org/jira/browse/SPARK-54025
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 4.1.0
> Reporter: Gengliang Wang
> Priority: Major
>
> When a table is cached using one table implementation (e.g., V2) and written
> through the other (e.g., V1), Spark may not automatically trigger recaching.
> As a result, the cached data can become stale even though the underlying
> table content has changed.
>
> This issue arises because the current recaching mechanism does not
> consistently handle cross-implementation writes. Given that the community is
> actively working on Data Source V2 (DSV2), many data sources are expected to
> have both V1 and V2 implementations for a period of time, making this issue
> more likely to occur in practice.
>
> *Proposed Fix:*
> Enhance the cache invalidation logic to detect writes that occur through a
> different table implementation (V1 ↔ V2) and trigger recaching accordingly.
>
> {*}Expected Outcome:{*}{*}{*}
> * Cached data remains up to date when a table is written through either V1
> or V2 paths.
> * Both logical-plan-based and file-path-based recaching continue to work as
> expected for V1&V2 connectors
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]