Hi Gang, I've recently started working on a similar topic so I'm glad you've brought this up.
I agree, [2] is not a big help here. TBH I am not sure that the current compatibility rules [3] are saying what they originally wanted, and the related examples increase the confusion. (I'm using `@Nullable` where the nullability actually depends on the repetition of the related field.) > 1. If the repeated field is not a group, then its type is the element type and elements are required. This one is clear: `@Nullable List<@Nonnull primitive>` > 2. If the repeated field is a group with multiple fields, then its type is the element type and elements are required. Quite clear: `@Nullable List<@Nonnull Tuple<...>>` (Note: this is actually a Struct instead of a Tuple) > 3. If the repeated field is a group with one field and is named either array or uses the LIST-annotated group's name with _tuple appended then the repeated type is the element type and elements are required. What does it actually mean? With all these very specific naming constraints we still say "...the repeated type is the element type...", hence: `@Nullable List<@Nonnull OneTuple<...>>`. Even examples state the same. Why is it different from point 4? > 4. Otherwise, the repeated field's type is the element type with the repeated field's repetition. Kind of clear: `@Nullable List<@Nonnull OneTuple<...>>`. But otherwise what? It actually includes the officially expected 3-level list without the naming convention that is suggested to be accepted. So why do we add the OneTuple? Instead of having such rules it would be much better to actually specify steps to identify a structure from the point of facing a LIST/MAP logical types and do recursion at the element level so it is clear how to specify deeply nested structures. We may even extend the current ones. For example I've seen Parquet schemas with repeated primitives without any LIST logical types. We should accept these as well as a `@Nonnull List<@Nonnull primitive>`. WDYT? Cheers, Gabor Gang Wu <ust...@gmail.com> ezt írta (időpont: 2024. okt. 30., Sze, 5:11): > Hi, > > Recently I tried to fix a bug [1] on parquet-cpp whom is having a hard time > reading Parquet file written by parquet-java with > *parquet.avro.write-old-list-structure=true* and with schema below: > ``` > optional group a (LIST) { > repeated group array (LIST) { > repeated int32 array; > } > } > ``` > > The question is whether it should be resolved as List<List<Integer>> or > List<OneTuple<List<Integer>>>. I think it should be the former but the > answer from parquet-cpp is currently the latter. > > It has been explained in [2] but it is not clear on this specific case. I > have opened a PR to try to clarify it on the spec: [3]. > > Any feedback is appreciated! > > [1] https://github.com/apache/arrow/pull/43995 > [2] > > https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#backward-compatibility-rules > [3] https://github.com/apache/parquet-format/pull/466 > > Best, > Gang >