tschaub opened a new pull request, #37817: URL: https://github.com/apache/arrow/pull/37817
### Rationale for this change This makes it possible to round trip a Parquet schema through the `pqarrow.FromParquet()` and `pqarrow.ToParquet()` functions, maintaining the field repetition from the original schema. ### What changes are included in this PR? The changes are isolated to the `pqarrow.ToParquet()` function. Field repetition from the original Parquet schema is retained by adding a `"PARQUET:repetition"` metadata value to the Arrow field. This is similar to the existing `"PARQUET:field_id"` metadata. ### Are these changes tested? A new `TestRoundTripSchema` function is added and existing tests are updated to reflect the new field metadata. ### Are there any user-facing changes? Before this change, if you started with a Parquet schema that had a repeated field, after calling `pqarrow.FromParquet()` to create an Arrow schema and then `pqarrow.ToParquet()` to create a Parquet schema, the previously repeated field would not be a list. After this change, the repeated field is still a repeated field. I've only handled repeated primitives. I assume repeated groups are transformed into lists of groups. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
