wzx140 commented on issue #10493:
URL: https://github.com/apache/iceberg/issues/10493#issuecomment-3744162191

   @singhpk234 @findepi  Here's a concrete case that illustrates this race 
condition: In a Spark task, I execute insert overwrite table followed 
immediately by select table. If an asynchronous Spark listener calls loadTable 
during the insert overwrite commit process, the select query will inevitably 
return stale data, which would be very confusing to users. Here's the timeline:
   ```
     1. spark_write calls loadTable for table^1 with snapshot1 ⇒ 
cache.put(table^1[snapshot1])
     2. Time passes, table^1 expires in cache ⇒ cache.expire(table^1[snapshot1])
     3. Listener calls loadTable for table^2 with snapshot1 ⇒ 
cache.put(table^2[snapshot1])
     4. spark_write commits using table^1, advancing to snapshot2
     5. spark_scan calls loadTable ⇒ cache.get() returns table^2[snapshot1] ⚠️ 
The snapshot is stale!
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to