Re: [PR] Spark: time-travel filter fails on renamed columns in BaseDistributedDataScan [iceberg]

via GitHub Wed, 03 Jun 2026 07:09:54 -0700


lilei1128 commented on code in PR #16523:
URL: https://github.com/apache/iceberg/pull/16523#discussion_r3349149130



##########
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java:
##########
@@ -296,10 +296,31 @@ public ScanBuilder 
newScanBuilder(CaseInsensitiveStringMap options) {
       icebergTable.refresh();
     }
 
+    // When snapshot-id is passed via options (e.g. DataFrameReader.option()) 
but this SparkTable
+    // was not constructed with a snapshotId field, resolve the schema against 
the requested
+    // snapshot so that filter column names are validated against the correct 
snapshot schema.
+    Long scanSnapshotId = snapshotId;

Review Comment:
   You're right. The defensive code is dead code — I've removed the 
scanSnapshotId block, scanSchemaFor(), and reverted newScanBuilder() to use 
snapshotSchema() directly.
   The actual fix is entirely in BaseDistributedDataScan: overriding 
useSnapshotSchema() to return true and changing specCache() to use specs() 
instead of table().specs().
   
   Could you help double-check the changes to ensure correctness? Thanks a lot!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Spark: time-travel filter fails on renamed columns in BaseDistributedDataScan [iceberg]

Reply via email to