This is an automated email from the ASF dual-hosted git repository.

gangwu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/parquet-format.git


The following commit(s) were added to refs/heads/master by this push:
     new 5740bf1  GH-465: Clarify backward-compatibility rules on LIST type 
(#466)
5740bf1 is described below

commit 5740bf175ecaba5fe269505b35c4eb962c46d3e4
Author: Gang Wu <[email protected]>
AuthorDate: Sun Dec 8 11:47:06 2024 +0800

    GH-465: Clarify backward-compatibility rules on LIST type (#466)
    
    Co-authored-by: Ed Seidl <[email protected]>
    Co-authored-by: Antoine Pitrou <[email protected]>
    Co-authored-by: emkornfield <[email protected]>
---
 LogicalTypes.md | 65 +++++++++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 54 insertions(+), 11 deletions(-)

diff --git a/LogicalTypes.md b/LogicalTypes.md
index 7b4b203..7294015 100644
--- a/LogicalTypes.md
+++ b/LogicalTypes.md
@@ -609,9 +609,23 @@ that is neither contained by a `LIST`- or `MAP`-annotated 
group nor annotated
 by `LIST` or `MAP` should be interpreted as a required list of required
 elements where the element type is the type of the field.
 
-Implementations should use either `LIST` and `MAP` annotations _or_ unannotated
-repeated fields, but not both. When using the annotations, no unannotated
-repeated types are allowed.
+```
+WARNING: writers should not produce list types like these examples! They are
+just for the purpose of reading existing data for backward-compatibility.
+
+// List<Integer> (non-null list, non-null elements)
+repeated int32 num;
+
+// List<Tuple<Integer, String>> (non-null list, non-null elements)
+repeated group my_list {
+  required int32 num;
+  optional binary str (STRING);
+}
+```
+
+For all fields in the schema, implementations should use either `LIST` and
+`MAP` annotations _or_ unannotated repeated fields, but not both. When using
+the annotations, no unannotated repeated types are allowed.
 
 ### Lists
 
@@ -670,6 +684,11 @@ optional group array_of_arrays (LIST) {
 
 #### Backward-compatibility rules
 
+New writer implementations should always produce the 3-level LIST structure 
shown
+above. However, historically data files have been produced that use different
+structures to represent list-like data, and readers may include compatibility
+measures to interpret them as intended.
+
 It is required that the repeated group of elements is named `list` and that
 its element field is named `element`. However, these names may not be used in
 existing data and should not be enforced as errors when reading. For example,
@@ -684,29 +703,39 @@ optional group my_list (LIST) {
 }
 ```
 
-Some existing data does not include the inner element layer. For
-backward-compatibility, the type of elements in `LIST`-annotated structures
+Some existing data does not include the inner element layer, resulting in a
+`LIST` that annotates a 2-level structure. Unlike the 3-level structure, the
+repetition of a 2-level structure can be `optional`, `required`, or `repeated`.
+When it is `repeated`, the `LIST`-annotated 2-level structure can only serve as
+an element within another `LIST`-annotated 2-level structure.
+
+For backward-compatibility, the type of elements in `LIST`-annotated structures
 should always be determined by the following rules:
 
 1. If the repeated field is not a group, then its type is the element type and
    elements are required.
 2. If the repeated field is a group with multiple fields, then its type is the
    element type and elements are required.
-3. If the repeated field is a group with one field and is named either `array`
+3. If the repeated field is a group with one field with `repeated` repetition,
+   then its type is the element type and elements are required.
+4. If the repeated field is a group with one field and is named either `array`
    or uses the `LIST`-annotated group's name with `_tuple` appended then the
    repeated type is the element type and elements are required.
-4. Otherwise, the repeated field's type is the element type with the repeated
+5. Otherwise, the repeated field's type is the element type with the repeated
    field's repetition.
 
 Examples that can be interpreted using these rules:
 
 ```
-// List<Integer> (nullable list, non-null elements)
+WARNING: writers should not produce list types like these examples! They are
+just for the purpose of reading existing data for backward-compatibility.
+
+// Rule 1: List<Integer> (nullable list, non-null elements)
 optional group my_list (LIST) {
   repeated int32 element;
 }
 
-// List<Tuple<String, Integer>> (nullable list, non-null elements)
+// Rule 2: List<Tuple<String, Integer>> (nullable list, non-null elements)
 optional group my_list (LIST) {
   repeated group element {
     required binary str (STRING);
@@ -714,19 +743,33 @@ optional group my_list (LIST) {
   };
 }
 
-// List<OneTuple<String>> (nullable list, non-null elements)
+// Rule 3: List<List<Integer>> (nullable outer list, non-null elements)
+optional group my_list (LIST) {
+  repeated group array (LIST) {
+    repeated int32 array;
+  };
+}
+
+// Rule 4: List<OneTuple<String>> (nullable list, non-null elements)
 optional group my_list (LIST) {
   repeated group array {
     required binary str (STRING);
   };
 }
 
-// List<OneTuple<String>> (nullable list, non-null elements)
+// Rule 4: List<OneTuple<String>> (nullable list, non-null elements)
 optional group my_list (LIST) {
   repeated group my_list_tuple {
     required binary str (STRING);
   };
 }
+
+// Rule 5: List<String>  (nullable list, nullable elements)
+optional group my_list (LIST) {
+  repeated group element {
+    optional binary str (STRING);
+  };
+}
 ```
 
 ### Maps

Reply via email to