etseidl commented on issue #6648:
URL: https://github.com/apache/arrow-rs/issues/6648#issuecomment-2445824944
TIL, thanks! One question, however. The spec actual says _A repeated field
that is neither contained by a LIST- or MAP-annotated group nor annotated by
LIST or MAP should be interpreted as a required list of required elements where
the element type is the type of the field_. So is it allowable for the max def
level to be 1 here? If the elements are required, shouldn't max_def be 0? TBH
I'm not sure why the spec is worded that way...nulls are clearly detectable so
I'd think the required vs optional could be deduced from the max def level. In
fact, arrow-cpp/pyarrow seems to read the given file just fine (although the
arrow schema seems do indicate all elements are not null).
```
>>> df = pq.read_table('repeated_no_list.parquet')
>>> df
pyarrow.Table
Int32: list<Int32: int32 not null> not null
child 0, Int32: int32 not null
String: list<String: string not null> not null
child 0, String: string not null
----
Int32: [[[0,1,2,3],[],[4],[5,6,7,8]]]
String:
[[["foo","zero","one","two"],["three"],["four"],["five","six","seven","eight"]]]
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]