zeroshade commented on PR #37817: URL: https://github.com/apache/arrow/pull/37817#issuecomment-1731767862
I'll try to dig into this a bit when I have the chance, but offhand the fact that the defLevels are different would be expected as the schema is different and has fewer levels in it. That said, offhand I think the issue is that a single repeated primitive node is simply not being seen as a LIST field. If we look at the [parquet format spec](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#backward-compatibility-rules), the first thing I notice is that the schema ``` required group root { repeated double numbers; } ``` Doesn't contain any field tagged with the `LIST` logical type, however the spec also says: > This does not affect repeated fields that are not annotated: A repeated field that is neither contained by a LIST- or MAP-annotated group nor annotated by LIST or MAP should be interpreted as a required list of required elements where the element type is the type of the field. We handle this case here: https://github.com/apache/arrow/blob/main/go/parquet/pqarrow/schema.go#L889 However, i think the writer assumes that we are using the proper 3-level annotation struct which is what we generate when calling `ToParquet` to convert an arrow schema into a Parquet schema. So without having dug into this yet, my initial gut feeling is that the issue is going to be in `path_builder.go` which is where we construct the deflevels and rep levels that we're going to write. In theory the resulting defLevels to write this should be all 0 as this should get converted into a non-nullable list of non-nullable elements. Hope this helps, and i'll try to find some time to dig into this further next week. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
