glitchy opened a new issue, #2654:
URL: https://github.com/apache/iceberg-rust/issues/2654

   ### What's the problem?
   
   `ParquetWriter` matches record batch columns to Iceberg fields by field id, 
reading the `PARQUET:field_id` (`PARQUET_FIELD_ID_META_KEY`) metadata off each 
Arrow field. When a caller hands it a record batch whose Arrow schema was built 
by hand--without that metadata--the write fails deep in value extraction with:
   
   ```
   DataInvalid => Field id N not found in struct array
   ```
   
   The message points at the symptom (`crates/iceberg/src/arrow/value.rs`), not 
the cause: the incoming schema carries no field ids at all. The right way to 
build the batch schema is to derive it from the table schema via 
`table.metadata().current_schema().as_ref().try_into()`, which stamps the 
field-id metadata--but nothing surfaces that, so it's an easy trap for 
downstream consumers.
   
   Reported by @malon64 while testing #2185 from a downstream Rust ingestion 
tool.
   
   ### Proposed fix
   
   Fail fast at the writer boundary. When matching by field id 
(`FieldMatchMode::Id`), validate the incoming record batch's Arrow schema on 
the first write and return a clear `DataInvalid` error that names the field(s) 
missing `PARQUET:field_id` and points at `current_schema().as_ref().try_into()`.
   
   - Purely additive--schemas built the right way are unaffected; only 
malformed hand-built schemas now fail early with an actionable message instead 
of opaquely at value-extraction time.
   - Recurses into nested struct/list fields. Skips the Arrow map `entries` 
wrapper, which has no Iceberg field id of its own (only its key/value do).
   
   ### Willingness to contribute
   
   I can contribute this fix independently--PR incoming.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to