rustyconover opened a new issue, #9444: URL: https://github.com/apache/arrow-rs/issues/9444
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** The Arrow IPC format supports a `custom_metadata` field on the `Message` flatbuffer envelope ([Message.fbs](https://github.com/apache/arrow/blob/main/format/Message.fbs#L154)), allowing per-batch metadata separate from schema-level metadata. Currently, the Rust `RecordBatch` struct has no `custom_metadata` field and the IPC reader/writer ignore it. PyArrow has supported this since v11.0.0 via `write_batch(batch, custom_metadata=...)` and `read_next_batch_with_custom_metadata()`. This means IPC files written by PyArrow with per-batch metadata lose that metadata when read by arrow-rs. **Describe the solution you'd like** 1. Add a `custom_metadata: HashMap<String, String>` field to `RecordBatch` with accessor methods (`custom_metadata()`, `custom_metadata_mut()`, `with_custom_metadata()`, `into_parts_with_custom_metadata()`) 2. IPC writer: serialize `custom_metadata` to the `Message` flatbuffer when writing record batches 3. IPC reader: extract `custom_metadata` from the `Message` at all reader call sites (`FileDecoder`, `StreamReader`, `StreamDecoder`) 4. arrow-flight: extract and propagate `custom_metadata` in `flight_data_to_arrow_batch` 5. arrow-select: preserve `custom_metadata` through `filter_record_batch` and `take_record_batch` 6. Preserve metadata through `slice()`, `project()`, `normalize()`, `with_schema()`, and `remove_column()` **Describe alternatives you've considered** - Storing per-batch metadata in schema-level metadata with a naming convention — this conflates two levels of metadata and doesn't match the IPC format's intent. - An `Option<HashMap<String, String>>` instead of `HashMap<String, String>` — `HashMap::new()` is zero-allocation so the overhead is minimal, and `Option` complicates every accessor for little gain. **Additional context** - `HashMap::new()` does not heap-allocate, so there is no performance concern for the default (empty metadata) case. - The existing `into_parts()` signature is unchanged for backward compatibility; a new `into_parts_with_custom_metadata()` is added. - Multi-batch merge operations (`concat_batches`, `interleave_record_batch`, `BatchCoalescer`) intentionally do not propagate per-batch metadata since the semantics are ambiguous when merging batches with different metadata. - Reuses existing `metadata_to_fb` (convert.rs) for writing and the KV extraction pattern for reading. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
