tschaub opened a new pull request, #37817:
URL: https://github.com/apache/arrow/pull/37817

   ### Rationale for this change
   
   This makes it possible to round trip a Parquet schema through the 
`pqarrow.FromParquet()` and `pqarrow.ToParquet()` functions, maintaining the 
field repetition from the original schema.
   
   ### What changes are included in this PR?
   
   The changes are isolated to the `pqarrow.ToParquet()` function.  Field 
repetition from the original Parquet schema is retained by adding a 
`"PARQUET:repetition"` metadata value to the Arrow field.  This is similar to 
the existing `"PARQUET:field_id"` metadata.
   
   ### Are these changes tested?
   
   A new `TestRoundTripSchema` function is added and existing tests are updated 
to reflect the new field metadata.
   
   ### Are there any user-facing changes?
   
   Before this change, if you started with a Parquet schema that had a repeated 
field, after calling `pqarrow.FromParquet()` to create an Arrow schema and then 
`pqarrow.ToParquet()` to create a Parquet schema, the previously repeated field 
would not be a list.  After this change, the repeated field is still a repeated 
field.
   
   I've only handled repeated primitives.  I assume repeated groups are 
transformed into lists of groups.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to