huan233usc commented on code in PR #16523:
URL: https://github.com/apache/iceberg/pull/16523#discussion_r3320988706
##########
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java:
##########
@@ -296,10 +296,31 @@ public ScanBuilder
newScanBuilder(CaseInsensitiveStringMap options) {
icebergTable.refresh();
}
+ // When snapshot-id is passed via options (e.g. DataFrameReader.option())
but this SparkTable
+ // was not constructed with a snapshotId field, resolve the schema against
the requested
+ // snapshot so that filter column names are validated against the correct
snapshot schema.
+ Long scanSnapshotId = snapshotId;
Review Comment:
Do we really need this?
From what I know so far --
-- `Spark` calls `IcebergSource.extractIdentifier(options)`.
-- In `IcebergSource.catalogAndIdentifier`, snapshot-id is read from the
options and encoded into the identifier name as db.table#snapshot_id_X.
-- `Spark` then calls
`SparkCatalog.loadTable(Identifier("db.table#snapshot_id_X"))`.
-- `SparkCatalog` parses the #snapshot_id_X selector and returns a
SparkTable built via `copyWithSnapshotId(X)` — so SparkTable.snapshotId is
already X.
When Spark eventually calls newScanBuilder(options), this.snapshotId !=
null, snapshotSchema() already resolves against the right schema, and the new
if (scanSnapshotId == null && branch == null) { options.get(SNAPSHOT_ID) ... }
branch is never entered.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]