zeevm opened a new issue, #2394: URL: https://github.com/apache/arrow-rs/issues/2394
A field with "Repeated" repetition and no "LIST" annotation are read as primitives instead of as list. To reproduce: create a file with a top level field schema like: `REPEATED BYTE_ARRAY vals (UTF8);` and write lists of string (i.e. with repetition levels of '0' and '1') this should be read as a List of strings as specified in https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#lists > This does not affect repeated fields that are not annotated: A repeated field that is neither contained by a LIST- or MAP-annotated group nor annotated by LIST or MAP should be interpreted as a required list of required elements where the element type is the type of the field. Instead it is read as a field of single string values, where string comprising a logical list are instead read as distinct rows. It is read correctly by pyarrow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
