etseidl commented on issue #6648:
URL: https://github.com/apache/arrow-rs/issues/6648#issuecomment-2445824944

   TIL, thanks! One question, however. The spec actual says _A repeated field 
that is neither contained by a LIST- or MAP-annotated group nor annotated by 
LIST or MAP should be interpreted as a required list of required elements where 
the element type is the type of the field_. So is it allowable for the max def 
level to be 1 here? If the elements are required, shouldn't max_def be 0? TBH 
I'm not sure why the spec is worded that way...nulls are clearly detectable so 
I'd think the required vs optional could be deduced from the max def level. In 
fact, arrow-cpp/pyarrow seems to read the given file just fine (although the 
arrow schema seems do indicate all elements are not null).
   ```
   >>> df = pq.read_table('repeated_no_list.parquet')
   >>> df
   pyarrow.Table
   Int32: list<Int32: int32 not null> not null
     child 0, Int32: int32 not null
   String: list<String: string not null> not null
     child 0, String: string not null
   ----
   Int32: [[[0,1,2,3],[],[4],[5,6,7,8]]]
   String: 
[[["foo","zero","one","two"],["three"],["four"],["five","six","seven","eight"]]]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to