laskoviymishka commented on code in PR #1045:
URL: https://github.com/apache/iceberg-go/pull/1045#discussion_r3237047732


##########
table/scanner.go:
##########
@@ -674,3 +690,79 @@ func (scan *Scan) ToArrowTable(ctx context.Context) 
(arrow.Table, error) {
 
        return array.NewTableFromRecords(schema, records), nil
 }
+
+func schemaContainsMeta(schema *iceberg.Schema) bool {
+       if schema == nil {
+               return false
+       }
+
+       _, hasRowIdMeta := schema.FindFieldByID(iceberg.RowIDFieldID)

Review Comment:
   I think there's a subtle issue here. `FindFieldByID(iceberg.RowIDFieldID)` 
basically never finds anything once a schema goes through `NewMetadata` — 
`reassignIDs` calls `AssignFreshSchemaIDs` (`schema.go:1366`), which renumbers 
every field starting from 1, so a schema passed in with `iceberg.RowID()` ends 
up with `_row_id` at id=3, not at `MaxInt32-107`.
   
   The practical consequence on any v3 table that legitimately declares 
`_row_id` in its schema: this returns false → synthesis path is taken → 
`removeMetadataFromSelectedFields` strips `_row_id` from the select list → a 
synthetic all-null `_row_id` is appended at the reserved id. The user's real 
`_row_id` column is silently dropped from the projection.
   
   The `…SchemaWithRowIDOnly` / `…WithLastUpdatedSequenceNumberOnly` tests look 
like they cover this case but actually go down the synthesis path too — their 
`rowIDField.ID == RowIDFieldID` assertion is trivially true for the synthesized 
field, so they pass without proving the user's column survived.
   
   A couple of ways to handle: look up by name 
(`FindFieldByName(iceberg.RowIDColumnName)`) instead of reserved id, or have 
`removeMetadataFromSelectedFields` skip the strip when the schema already has a 
field with that name. wdyt?



##########
table/scanner.go:
##########
@@ -262,6 +265,19 @@ func (scan *Scan) Projection() (*iceberg.Schema, error) {
                return curSchema, nil
        }
 
+       hasRowLineageMeta := selectedFieldsContainsMeta(scan.selectedFields, 
caseSensitive)
+       schemaHasRowLineageMeta := schemaContainsMeta(curSchema)
+       if hasRowLineageMeta && curVersion >= minFormatVersionRowLineage && 
!schemaHasRowLineageMeta {
+
+               removedMetadataSlice, missingMetaFields := 
removeMetadataFromSelectedFields(scan.selectedFields, caseSensitive)
+               sch, err := curSchema.Select(scan.caseSensitive, 
removedMetadataSlice...)
+               if err != nil {
+                       return nil, err
+               }
+
+               return iceberg.NewSchema(sch.ID, append(sch.Fields(), 
missingMetaFields...)...), nil

Review Comment:
   Carrying over from the previous round — `iceberg.NewSchema` calls 
`NewSchemaWithIdentifiers(id, []int{}, ...)`, so `IdentifierFieldIDs` get 
dropped here. `sch` from `Select` carries them correctly; the wrap throws them 
away. `NewSchemaWithIdentifiers(sch.ID, sch.IdentifierFieldIDs, 
append(sch.Fields(), missingMetaFields...)...)` would preserve them.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to