zeroshade commented on PR #37817:
URL: https://github.com/apache/arrow/pull/37817#issuecomment-1731767862

   I'll try to dig into this a bit when I have the chance, but offhand the fact 
that the defLevels are different would be expected as the schema is different 
and has fewer levels in it. That said, offhand I think the issue is that a 
single repeated primitive node is simply not being seen as a LIST field. If we 
look at the [parquet format 
spec](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#backward-compatibility-rules),
 the first thing I notice is that the schema
   
   ```
   required group root {
     repeated double numbers;
   }
   ```
   
   Doesn't contain any field tagged with the `LIST` logical type, however the 
spec also says:
   
   > This does not affect repeated fields that are not annotated: A repeated 
field that is neither contained by a LIST- or MAP-annotated group nor annotated 
by LIST or MAP should be interpreted as a required list of required elements 
where the element type is the type of the field.
   
   We handle this case here: 
https://github.com/apache/arrow/blob/main/go/parquet/pqarrow/schema.go#L889 
   
   However, i think the writer assumes that we are using the proper 3-level 
annotation struct which is what we generate when calling `ToParquet` to convert 
an arrow schema into a Parquet schema. So without having dug into this yet, my 
initial gut feeling is that the issue is going to be in `path_builder.go` which 
is where we construct the deflevels and rep levels that we're going to write. 
In theory the resulting defLevels to write this should be all 0 as this should 
get converted into a non-nullable list of non-nullable elements.
   
   Hope this helps, and i'll try to find some time to dig into this further 
next week.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to