etseidl commented on PR #466:
URL: https://github.com/apache/parquet-format/pull/466#issuecomment-2479863682

   Apologies for muddying the waters, but I'm still trying to get this all 
clear in my head. I'm wondering if rather than adding a new rule, can we simply 
modify Rule 3 to say
   ```
   If the repeated field is a group with one field, is named either "array" or 
uses the LIST-annotated 
   group's name with "_tuple" appended, and would otherwise be a valid 3-level 
structure as outlined 
   above, then the repeated type is the element type and elements are required.
   ```
   The form that triggered all this discussion
   ```
   optional group a (LIST) {
     repeated group array (LIST) {
       repeated int32 array;
     }
   }
   ```
   is interpreted by some readers using Rule 3 because the repeated group is 
named `array`. If Rule 3 is modified as above, this example would not trigger 
Rule 3 due to a) the `repeated` repetition on the inner-most field, and b) the 
`LIST` annotation on the repeated group. Using Rule 4, the element type is the 
repeated field's type, which in this case is a Rule 1 primitive list, yielding 
a nested 2-level list (`List<List<Integer>>`) with non-nullable elements at 
both levels.
   
   Without the `LIST` annotation on the repeated group, the above becomes
   ```
   optional group a (LIST) {
     repeated group array {
       repeated int32 array;
     }
   }
   ```
   which is not valid at all. It cannot be a 3-level nor Rule 3 list due to the 
`repeated` inner element field. Rule 4 could be construed to say this is a 
`List<OneTuple<List<Integer>>`, but that would require mixing `LIST` annotated 
and unannotated lists, which has already been forbidden earlier in the 
specification.
   
   @rdblue objected to an earlier draft similar to what I'm proposing, but the 
new Rule 3 only works because of the `LIST` annotation on the repeated group, 
not because the innermost field is `repeated`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to