wgtmac commented on code in PR #43995:
URL: https://github.com/apache/arrow/pull/43995#discussion_r1820982299


##########
cpp/src/parquet/arrow/schema.cc:
##########
@@ -681,6 +681,10 @@ Status ListToSchemaField(const GroupNode& group, LevelInfo 
current_levels,
       // List of primitive type
       RETURN_NOT_OK(
           NodeToSchemaField(*list_group.field(0), current_levels, ctx, out, 
child_field));
+    } else if (list_group.field_count() == 1 && 
list_group.field(0)->is_repeated()) {

Review Comment:
   The rule (3) of backward-compatibility rules is that `If the repeated field 
is a group with one field and is named either array or uses the LIST-annotated 
group's name with _tuple appended then the repeated type is the element type 
and elements are required.`. It says that **the repeated type is the element 
type**.
   
   ```
   optional group my_list (LIST) {
     repeated group array {
       required binary str (STRING);
     };
   }
   ```
   So for the schema you've just mentioned above, its element type is `group 
array { required binary str (STRING); }` which perfectly resolves to 
`OneTuple<String>`.
   
   ```
   optional group a (LIST) {
     repeated group array (LIST) {
       repeated int32 array;
     }
   }
   ```
   However, for the schema I've mentioned in this issue, its element type is 
`group array (LIST) { repeated int32 array; }` and it perfectly resolves to 
`List<int32>` according to rule (1) which is `If the repeated field is not a 
group, then its type is the element type and elements are required.`.
   
   The parquet-java implementation has interpreted this case in the same way: 
https://github.com/apache/parquet-java/blob/42cf31c0fbe4f000d4ddb1e1092c6634989ea3ca/parquet-avro/src/main/java/org/apache/parquet/avro/AvroSchemaConverter.java#L588



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to