wgtmac commented on code in PR #466:
URL: https://github.com/apache/parquet-format/pull/466#discussion_r1828608033


##########
LogicalTypes.md:
##########
@@ -648,49 +650,92 @@ optional group my_list (LIST) {
 }
 ```
 
-Some existing data does not include the inner element layer. For
-backward-compatibility, the type of elements in `LIST`-annotated structures
-should always be determined by the following rules:
+##### 2-level structure
+
+Some existing data does not include the inner element layer, meaning that 
`LIST`
+annotates a 2-level structure. In contrast to 3-level structure, the repetition
+of 2-level structure can be `optional`, `required`, or `repeated`.
+
+```
+<list-repetition> group <name> (LIST) {
+  repeated <element-type> <element-name>;
+}
+```
+
+For backward-compatibility, the type of elements in `LIST`-annotated 2-level
+structures should always be determined by the following rules:
 
 1. If the repeated field is not a group, then its type is the element type and
    elements are required.
-2. If the repeated field is a group with multiple fields, then its type is the
-   element type and elements are required.
-3. If the repeated field is a group with one field and is named either `array`
-   or uses the `LIST`-annotated group's name with `_tuple` appended then the
-   repeated type is the element type and elements are required.
-4. Otherwise, the repeated field's type is the element type with the repeated
-   field's repetition.
+2. If the repeated field is a group with multiple fields, then its type (Struct
+   type with multiple fields) is the element type and elements are required.
+3. If the repeated field is a group with one `required` or `optional` field,
+   and is named either `array` or uses the `LIST`-annotated group's name with
+   `_tuple` appended, then the repeated type (Struct type with single field) is
+   the element type and elements are required.
+4. If the repeated field is a `LIST`-annotated group with one `repeated` field,
+   then the element type is a list type with 2-level structure and elements are
+   required.
+5. Otherwise, the repeated field's type is the element type with the repeated
+   field's repetition. Please note that the repeated field here (a group with
+   one field) cannot be `LIST`-annotated or `MAP`-annotated 3-level structure,
+   as such a group's repetition must be `required` or `optional`.
 
 Examples that can be interpreted using these rules:
 
 ```
-// List<Integer> (nullable list, non-null elements)
+// Rule 1: List<Integer> (nullable list, non-null elements)
 optional group my_list (LIST) {
   repeated int32 element;
 }
 
-// List<Tuple<String, Integer>> (nullable list, non-null elements)
+// Rule 2: List<Struct<String, Integer>> (nullable list, non-null elements)
 optional group my_list (LIST) {
   repeated group element {
     required binary str (STRING);
     required int32 num;
   };
 }
 
-// List<OneTuple<String>> (nullable list, non-null elements)
+// Rule 3: List<Struct<String>> (nullable list, non-null elements)
 optional group my_list (LIST) {
   repeated group array {
     required binary str (STRING);
   };
 }
 
-// List<OneTuple<String>> (nullable list, non-null elements)
+// Rule 3: List<Struct<String>> (nullable list, non-null elements)
 optional group my_list (LIST) {
   repeated group my_list_tuple {
     required binary str (STRING);
   };
 }
+
+// List<List<Integer>>
+// Rule 4: nullable outer list with non-null elements
+// Rule 1: non-null inner list with non-null elements
+optional group my_list (LIST) {
+  repeated group array (LIST) {
+    repeated int32 array;
+  }
+}
+```
+
+##### 1-level structure without `LIST` annotation
+
+Some existing data does not even have the `LIST` annotation and simply uses
+`repeated` repetition to annotate the element type. In this case, the element
+type MUST be a primitive type and both the list and elements are required.

Review Comment:
   > AFAICT using repeated without the LIST annotation is still supported by 
the spec
   
   I don't think it is supported by the spec because it is unclear yet. The 
official list type in the spec is the LIST-annotated group with 3-level 
structure, which support arbitrary nesting and full capability to specify 
nullability of each level. The LIST-annotated group with 3-level structure 
should always be used by writers and others fall into the category of backward 
compatibility to deal with existing files. A writer can accidentally produce 
such files does not mean it should be that way.
   
   > Also, I don't think the current 
[wording](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#nested-types)
 requires a primitive type
   
   That's true. Let me change this.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to