lilei1128 commented on code in PR #16523:
URL: https://github.com/apache/iceberg/pull/16523#discussion_r3349149130
##########
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java:
##########
@@ -296,10 +296,31 @@ public ScanBuilder
newScanBuilder(CaseInsensitiveStringMap options) {
icebergTable.refresh();
}
+ // When snapshot-id is passed via options (e.g. DataFrameReader.option())
but this SparkTable
+ // was not constructed with a snapshotId field, resolve the schema against
the requested
+ // snapshot so that filter column names are validated against the correct
snapshot schema.
+ Long scanSnapshotId = snapshotId;
Review Comment:
You're right. The defensive code is dead code — I've removed the
scanSnapshotId block, scanSchemaFor(), and reverted newScanBuilder() to use
snapshotSchema() directly.
The actual fix is entirely in BaseDistributedDataScan: overriding
useSnapshotSchema() to return true and changing specCache() to use specs()
instead of table().specs().
Could you help double-check the changes to ensure correctness? Thanks a lot!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]