gszadovszky commented on code in PR #466:
URL: https://github.com/apache/parquet-format/pull/466#discussion_r1839842596


##########
LogicalTypes.md:
##########
@@ -684,49 +686,99 @@ optional group my_list (LIST) {
 }
 ```
 
-Some existing data does not include the inner element layer. For
-backward-compatibility, the type of elements in `LIST`-annotated structures
-should always be determined by the following rules:
+##### 2-level structure
+
+Some existing data does not include the inner element layer, meaning that 
`LIST`
+annotates a 2-level structure. In contrast to 3-level structure, the repetition
+of 2-level structure can be `optional`, `required`, or `repeated`.
+
+```
+<list-repetition> group <name> (LIST) {
+  repeated <element-type> <element-name>;
+}
+```
+
+For backward-compatibility, the type of elements in `LIST`-annotated 2-level
+structures should always be determined by the following rules:
 
 1. If the repeated field is not a group, then its type is the element type and
    elements are required.
 2. If the repeated field is a group with multiple fields, then its type is the
    element type and elements are required.
-3. If the repeated field is a group with one field and is named either `array`
+3. If the repeated field is a group with a `repeated` field, then the repeated
+   field is the element type because the type cannot be a 3-level list.
+4. If the repeated field is a group with one field and is named either `array`
    or uses the `LIST`-annotated group's name with `_tuple` appended then the
    repeated type is the element type and elements are required.
-4. Otherwise, the repeated field's type is the element type with the repeated
+5. Otherwise, the repeated field's type is the element type with the repeated
    field's repetition.
 
 Examples that can be interpreted using these rules:
 
 ```
-// List<Integer> (nullable list, non-null elements)
+// Rule 1: List<Integer> (nullable list, non-null elements)
 optional group my_list (LIST) {
   repeated int32 element;
 }
 
-// List<Tuple<String, Integer>> (nullable list, non-null elements)
+// Rule 2: List<Tuple<String, Integer>> (nullable list, non-null elements)
 optional group my_list (LIST) {
   repeated group element {
     required binary str (STRING);
     required int32 num;
   };
 }
 
-// List<OneTuple<String>> (nullable list, non-null elements)
+// Rule 3: List<List<Integer>> (nullable outer list, non-null elements)
+optional group my_list (LIST) {
+  repeated group array (LIST) {
+    repeated int32 array;
+  };
+}
+
+// Rule 4: List<OneTuple<String>> (nullable list, non-null elements)
 optional group my_list (LIST) {
   repeated group array {
     required binary str (STRING);
   };
 }
 
-// List<OneTuple<String>> (nullable list, non-null elements)
+// Rule 4: List<OneTuple<String>> (nullable list, non-null elements)
 optional group my_list (LIST) {
   repeated group my_list_tuple {
     required binary str (STRING);
   };
 }
+
+// Rule 5: List<OneTuple<List<Integer>>> (nullable outer list, non-null 
elements)
+optional group my_list (LIST) {
+  repeated group foo {
+    repeated int32 bar;
+  };
+}
+```
+
+##### 1-level structure without `LIST` annotation
+
+Some existing data does not even have the `LIST` annotation and simply uses
+`repeated` repetition to annotate the element type. For backward-compatibility,
+both the list and elements are `required`.
+
+```
+// List<Integer> (non-null list, non-null elements)
+repeated int32 num;
+
+// Tuple<List<Integer>, List<String>> (non-null list, non-null elements)
+optional group my_list {
+  repeated int32 num;
+  repeated binary str (STRING);
+}
+
+// List<Tuple<Integer, String>> (non-null list, non-null elements)
+repeated group my_list {
+  required int32 num;
+  optional binary str (STRING);
+}
 ```

Review Comment:
   As @etseidl mentioned before, there is a 
[section](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md?plain=1#L604-L614)
 just before lists that clearly states that the repeated fields without 
annotation is supported. So, it is misleading that we explain the "1-level 
structure" at the backward compatibility section. Maybe move the examples to 
the existing section and keep only a reference that states the "1-level 
structure" is actually a 1st class citizen in the spec (and not deprecated).
   
   Or actually deprecate it now? Iceberg seems to only support the 3-level 
list. Meanwhile, there are engines still write the 1-level one, like 
parquet-rs. Even parquet-java writes these in case of the protobuf binding is 
used.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to