Vitalii Li created SPARK-54216:
----------------------------------

             Summary: Cache refresh returns stale data for DataSource V2 tables 
with immutable Table instances
                 Key: SPARK-54216
                 URL: https://issues.apache.org/jira/browse/SPARK-54216
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 4.1.0
            Reporter: Vitalii Li
             Fix For: 4.1.0


*Problem*

After modifying a V2 table and calling `refreshTable()` or `recacheByPlan()`, 
cached queries return stale data instead of updated data.

*Root Cause*

`CacheManager.recacheByCondition()` re-executes the old cached plan containing 
an immutable `Table` instance pointing to a previous snapshot. This reads stale 
data.

V1 tables don't have this issue because they use mutable file indexes that 
implicitly refresh.

*Reproduce*

{code:scala}
spark.table("v2_table").cache().count()  // Cache populated
spark.sql("INSERT INTO v2_table VALUES (3, 'new')")  // Modify table
spark.catalog.refreshTable("v2_table")  // Refresh cache
spark.table("v2_table").show()  // BUG: Shows old data
{code}

*Solution*

- Modify `recacheByCondition` to accept optional `freshPlan` parameter
- Use fresh plan (with current snapshot) for re-execution instead of old cached 
plan
- Update cached plan entry to use fresh plan

*Impact*

Affects Delta Lake, Iceberg, and any V2 table with immutable Table instances.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to