mhilton opened a new issue, #4702:
URL: https://github.com/apache/arrow-rs/issues/4702

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   We would like to use the parquet files written from a set of arrow record 
batches as part of an apache-iceberg snapshot without modification. The 
apache-iceberg [parquet 
specification](https://iceberg.apache.org/spec/#parquet) requires that 
field-ids are present. 
   
   **Describe the solution you'd like**
   The 
s[olution](https://github.com/apache/arrow/blob/f3010bac94cbd588ecebd6e7839f9d56e97b1a9b/go/parquet/pqarrow/schema.go#L397)
 implemented by (at least) the go parquet package seems reasonable. This uses a 
metadata value with the key `PARQUET:field_id` to determine the field_id when 
converting an arrow schema into a parquet schema. If there is no such metadata 
entry then the field_id will not be present.
   
   **Describe alternatives you've considered**
   An alternative would be to add a mechanism to `WriterProperties` to specify 
the `field_id` to use with a column. This presumably would work in a similar 
manner to 
[encoding](https://docs.rs/parquet/latest/parquet/file/properties/struct.WriterProperties.html#method.encoding).
   
   **Additional context**
   N/A


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to