Re: [PR] Spark: time-travel filter fails on renamed columns in BaseDistributedDataScan [iceberg]

via GitHub Thu, 28 May 2026 15:19:58 -0700


huan233usc commented on code in PR #16523:
URL: https://github.com/apache/iceberg/pull/16523#discussion_r3320988706



##########
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java:
##########
@@ -296,10 +296,31 @@ public ScanBuilder 
newScanBuilder(CaseInsensitiveStringMap options) {
       icebergTable.refresh();
     }
 
+    // When snapshot-id is passed via options (e.g. DataFrameReader.option()) 
but this SparkTable
+    // was not constructed with a snapshotId field, resolve the schema against 
the requested
+    // snapshot so that filter column names are validated against the correct 
snapshot schema.
+    Long scanSnapshotId = snapshotId;

Review Comment:
   Do we really need this?
   
   From what I know so far --
   
   -- `Spark` calls `IcebergSource.extractIdentifier(options)`.
   -- In `IcebergSource.catalogAndIdentifier`, snapshot-id is read from the 
options and encoded into the identifier name as db.table#snapshot_id_X.
   -- `Spark` then calls 
`SparkCatalog.loadTable(Identifier("db.table#snapshot_id_X"))`.
   -- `SparkCatalog` parses the #snapshot_id_X selector and returns a 
SparkTable built via `copyWithSnapshotId(X)` — so SparkTable.snapshotId is 
already X.
   
   
   When Spark eventually calls newScanBuilder(options), this.snapshotId != 
null, snapshotSchema() already resolves against the right schema, and the new 
if (scanSnapshotId == null && branch == null) { options.get(SNAPSHOT_ID) ... } 
branch is never entered.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Spark: time-travel filter fails on renamed columns in BaseDistributedDataScan [iceberg]

Reply via email to