[GitHub] [arrow-rs] mhilton opened a new issue, #4702: parquet: support setting the field_id with an ArrowWriter

via GitHub Wed, 16 Aug 2023 02:03:44 -0700


mhilton opened a new issue, #4702:
URL: https://github.com/apache/arrow-rs/issues/4702

**Is your feature request related to a problem or challenge? Please describe
what you are trying to do.**
We would like to use the parquet files written from a set of arrow record
batches as part of an apache-iceberg snapshot without modification. The
apache-iceberg [parquet
specification](https://iceberg.apache.org/spec/#parquet) requires that
field-ids are present.

**Describe the solution you'd like**
The
s[olution](https://github.com/apache/arrow/blob/f3010bac94cbd588ecebd6e7839f9d56e97b1a9b/go/parquet/pqarrow/schema.go#L397)
implemented by (at least) the go parquet package seems reasonable. This uses a
metadata value with the key `PARQUET:field_id` to determine the field_id when
converting an arrow schema into a parquet schema. If there is no such metadata
entry then the field_id will not be present.

**Describe alternatives you've considered**
An alternative would be to add a mechanism to `WriterProperties` to specify
the `field_id` to use with a column. This presumably would work in a similar
manner to
[encoding](https://docs.rs/parquet/latest/parquet/file/properties/struct.WriterProperties.html#method.encoding).

**Additional context**
N/A

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] mhilton opened a new issue, #4702: parquet: support setting the field_id with an ArrowWriter

Reply via email to