zeroshade commented on code in PR #1358:
URL: https://github.com/apache/iceberg-go/pull/1358#discussion_r3521444538


##########
table/scanner.go:
##########
@@ -353,6 +340,39 @@ func (scan *Scan) Projection() (*iceberg.Schema, error) {
        return schema, nil
 }
 
+func (scan *Scan) effectiveSchema() (*iceberg.Schema, error) {
+       curSchema := scan.metadata.CurrentSchema()
+       var snap *Snapshot
+       if scan.snapshotID != nil {
+               snap = scan.metadata.SnapshotByID(*scan.snapshotID)
+               if snap == nil {
+                       return nil, fmt.Errorf("%w: snapshot not found: %d", 
ErrInvalidOperation, *scan.snapshotID)
+               }
+       } else if scan.asOfTimestamp != nil {
+               entries := slices.Collect(scan.metadata.SnapshotLogs())
+               for i := len(entries) - 1; i >= 0; i-- {
+                       entry := entries[i]
+                       if entry.TimestampMs <= *scan.asOfTimestamp {
+                               snap = 
scan.metadata.SnapshotByID(entry.SnapshotID)
+
+                               break
+                       }
+               }
+       }
+
+       if snap == nil || snap.SchemaID == nil {
+               return curSchema, nil

Review Comment:
   For a default scan with no explicit snapshot/as-of, this returns 
`CurrentSchema()` rather than resolving `CurrentSnapshot().SchemaID`. Is that 
intentional after schema-only metadata updates? If so, please add a short 
comment documenting why default scans differ from 
`WithSnapshotID(currentSnapshotID)`.



##########
table/scanner.go:
##########
@@ -353,6 +340,39 @@ func (scan *Scan) Projection() (*iceberg.Schema, error) {
        return schema, nil
 }
 
+func (scan *Scan) effectiveSchema() (*iceberg.Schema, error) {
+       curSchema := scan.metadata.CurrentSchema()
+       var snap *Snapshot
+       if scan.snapshotID != nil {
+               snap = scan.metadata.SnapshotByID(*scan.snapshotID)
+               if snap == nil {
+                       return nil, fmt.Errorf("%w: snapshot not found: %d", 
ErrInvalidOperation, *scan.snapshotID)
+               }
+       } else if scan.asOfTimestamp != nil {
+               entries := slices.Collect(scan.metadata.SnapshotLogs())
+               for i := len(entries) - 1; i >= 0; i-- {
+                       entry := entries[i]
+                       if entry.TimestampMs <= *scan.asOfTimestamp {
+                               snap = 
scan.metadata.SnapshotByID(entry.SnapshotID)
+
+                               break
+                       }
+               }
+       }
+
+       if snap == nil || snap.SchemaID == nil {
+               return curSchema, nil
+       }
+
+       for _, schema := range scan.metadata.Schemas() {
+               if schema.ID == *snap.SchemaID {
+                       return schema, nil
+               }
+       }
+
+       return curSchema, nil

Review Comment:
   When `snap.SchemaID` is non-nil but no matching schema exists, this falls 
back to `CurrentSchema()`. That differs from the legacy no-schema-id fallback 
and can silently bind predicates/pruning against the wrong field IDs, 
incorrectly dropping candidate manifests/files on malformed or partially-loaded 
metadata. Please only fall back when `snap.SchemaID == nil`; if `snap.SchemaID 
!= nil` and lookup fails, return an 
`ErrInvalidMetadata`/`ErrInvalidOperation`-wrapped error naming the snapshot ID 
and missing schema ID. Also add a test for a non-nil-but-unknown snapshot 
schema-id.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to