zeroshade commented on code in PR #1358:
URL: https://github.com/apache/iceberg-go/pull/1358#discussion_r3521444538
##########
table/scanner.go:
##########
@@ -353,6 +340,39 @@ func (scan *Scan) Projection() (*iceberg.Schema, error) {
return schema, nil
}
+func (scan *Scan) effectiveSchema() (*iceberg.Schema, error) {
+ curSchema := scan.metadata.CurrentSchema()
+ var snap *Snapshot
+ if scan.snapshotID != nil {
+ snap = scan.metadata.SnapshotByID(*scan.snapshotID)
+ if snap == nil {
+ return nil, fmt.Errorf("%w: snapshot not found: %d",
ErrInvalidOperation, *scan.snapshotID)
+ }
+ } else if scan.asOfTimestamp != nil {
+ entries := slices.Collect(scan.metadata.SnapshotLogs())
+ for i := len(entries) - 1; i >= 0; i-- {
+ entry := entries[i]
+ if entry.TimestampMs <= *scan.asOfTimestamp {
+ snap =
scan.metadata.SnapshotByID(entry.SnapshotID)
+
+ break
+ }
+ }
+ }
+
+ if snap == nil || snap.SchemaID == nil {
+ return curSchema, nil
Review Comment:
For a default scan with no explicit snapshot/as-of, this returns
`CurrentSchema()` rather than resolving `CurrentSnapshot().SchemaID`. Is that
intentional after schema-only metadata updates? If so, please add a short
comment documenting why default scans differ from
`WithSnapshotID(currentSnapshotID)`.
##########
table/scanner.go:
##########
@@ -353,6 +340,39 @@ func (scan *Scan) Projection() (*iceberg.Schema, error) {
return schema, nil
}
+func (scan *Scan) effectiveSchema() (*iceberg.Schema, error) {
+ curSchema := scan.metadata.CurrentSchema()
+ var snap *Snapshot
+ if scan.snapshotID != nil {
+ snap = scan.metadata.SnapshotByID(*scan.snapshotID)
+ if snap == nil {
+ return nil, fmt.Errorf("%w: snapshot not found: %d",
ErrInvalidOperation, *scan.snapshotID)
+ }
+ } else if scan.asOfTimestamp != nil {
+ entries := slices.Collect(scan.metadata.SnapshotLogs())
+ for i := len(entries) - 1; i >= 0; i-- {
+ entry := entries[i]
+ if entry.TimestampMs <= *scan.asOfTimestamp {
+ snap =
scan.metadata.SnapshotByID(entry.SnapshotID)
+
+ break
+ }
+ }
+ }
+
+ if snap == nil || snap.SchemaID == nil {
+ return curSchema, nil
+ }
+
+ for _, schema := range scan.metadata.Schemas() {
+ if schema.ID == *snap.SchemaID {
+ return schema, nil
+ }
+ }
+
+ return curSchema, nil
Review Comment:
When `snap.SchemaID` is non-nil but no matching schema exists, this falls
back to `CurrentSchema()`. That differs from the legacy no-schema-id fallback
and can silently bind predicates/pruning against the wrong field IDs,
incorrectly dropping candidate manifests/files on malformed or partially-loaded
metadata. Please only fall back when `snap.SchemaID == nil`; if `snap.SchemaID
!= nil` and lookup fails, return an
`ErrInvalidMetadata`/`ErrInvalidOperation`-wrapped error naming the snapshot ID
and missing schema ID. Also add a test for a non-nil-but-unknown snapshot
schema-id.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]