glitchy opened a new issue, #2654: URL: https://github.com/apache/iceberg-rust/issues/2654
### What's the problem? `ParquetWriter` matches record batch columns to Iceberg fields by field id, reading the `PARQUET:field_id` (`PARQUET_FIELD_ID_META_KEY`) metadata off each Arrow field. When a caller hands it a record batch whose Arrow schema was built by hand--without that metadata--the write fails deep in value extraction with: ``` DataInvalid => Field id N not found in struct array ``` The message points at the symptom (`crates/iceberg/src/arrow/value.rs`), not the cause: the incoming schema carries no field ids at all. The right way to build the batch schema is to derive it from the table schema via `table.metadata().current_schema().as_ref().try_into()`, which stamps the field-id metadata--but nothing surfaces that, so it's an easy trap for downstream consumers. Reported by @malon64 while testing #2185 from a downstream Rust ingestion tool. ### Proposed fix Fail fast at the writer boundary. When matching by field id (`FieldMatchMode::Id`), validate the incoming record batch's Arrow schema on the first write and return a clear `DataInvalid` error that names the field(s) missing `PARQUET:field_id` and points at `current_schema().as_ref().try_into()`. - Purely additive--schemas built the right way are unaffected; only malformed hand-built schemas now fail early with an actionable message instead of opaquely at value-extraction time. - Recurses into nested struct/list fields. Skips the Arrow map `entries` wrapper, which has no Iceberg field id of its own (only its key/value do). ### Willingness to contribute I can contribute this fix independently--PR incoming. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
