etseidl commented on PR #466:
URL: https://github.com/apache/parquet-format/pull/466#issuecomment-2479863682
Apologies for muddying the waters, but I'm still trying to get this all
clear in my head. I'm wondering if rather than adding a new rule, can we simply
modify Rule 3 to say
```
If the repeated field is a group with one field, is named either "array" or
uses the LIST-annotated
group's name with "_tuple" appended, and would otherwise be a valid 3-level
structure as outlined
above, then the repeated type is the element type and elements are required.
```
The form that triggered all this discussion
```
optional group a (LIST) {
repeated group array (LIST) {
repeated int32 array;
}
}
```
is interpreted by some readers using Rule 3 because the repeated group is
named `array`. If Rule 3 is modified as above, this example would not trigger
Rule 3 due to a) the `repeated` repetition on the inner-most field, and b) the
`LIST` annotation on the repeated group. Using Rule 4, the element type is the
repeated field's type, which in this case is a Rule 1 primitive list, yielding
a nested 2-level list (`List<List<Integer>>`) with non-nullable elements at
both levels.
Without the `LIST` annotation on the repeated group, the above becomes
```
optional group a (LIST) {
repeated group array {
repeated int32 array;
}
}
```
which is not valid at all. It cannot be a 3-level nor Rule 3 list due to the
`repeated` inner element field. Rule 4 could be construed to say this is a
`List<OneTuple<List<Integer>>`, but that would require mixing `LIST` annotated
and unannotated lists, which has already been forbidden earlier in the
specification.
@rdblue objected to an earlier draft similar to what I'm proposing, but the
new Rule 3 only works because of the `LIST` annotation on the repeated group,
not because the innermost field is `repeated`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]