wgtmac commented on code in PR #43995:
URL: https://github.com/apache/arrow/pull/43995#discussion_r1823796150


##########
cpp/src/parquet/arrow/schema.cc:
##########
@@ -651,45 +653,65 @@ Status ListToSchemaField(const GroupNode& group, 
LevelInfo current_levels,
 
   int16_t repeated_ancestor_def_level = current_levels.IncrementRepeated();
   if (list_node.is_group()) {
-    // Resolve 3-level encoding
-    //
-    // required/optional group name=whatever {
-    //   repeated group name=list {
-    //     required/optional TYPE item;
-    //   }
-    // }
-    //
-    // yields list<item: TYPE ?nullable> ?nullable
-    //
-    // We distinguish the special case that we have
-    //
-    // required/optional group name=whatever {
-    //   repeated group name=array or $SOMETHING_tuple {
-    //     required/optional TYPE item;
-    //   }
-    // }
-    //
-    // In this latter case, the inner type of the list should be a struct
-    // rather than a primitive value
-    //
-    // yields list<item: struct<item: TYPE ?nullable> not null> ?nullable
     const auto& list_group = static_cast<const GroupNode&>(list_node);
-    // Special case mentioned in the format spec:
-    //   If the name is array or ends in _tuple, this should be a list of 
struct
-    //   even for single child elements.
-    if (list_group.field_count() == 1 && !HasStructListName(list_group)) {
-      // List of primitive type
-      RETURN_NOT_OK(
-          NodeToSchemaField(*list_group.field(0), current_levels, ctx, out, 
child_field));
-    } else {
+    if (list_group.field_count() > 1) {
+      // The inner type of the list should be a struct when there are multiple 
fields
+      // in the repeated group
       RETURN_NOT_OK(GroupToStruct(list_group, current_levels, ctx, out, 
child_field));
+    } else if (list_group.field_count() == 1) {
+      const auto& repeated_field = list_group.field(0);
+      if (repeated_field->is_repeated()) {

Review Comment:
   No, this does not work. It has two issues:
   - MAP annotation will never happen. I have explained at 
https://github.com/apache/parquet-format/pull/466#issuecomment-2449039626. Same 
for the non-legacy three-level list.
   - `ListToSchemaField()` will throw at 
https://github.com/apache/arrow/blob/3917b605ef1851ad036fb62856efcc83e32f8580/cpp/src/parquet/arrow/schema.cc#L635
 due to the spec 
https://github.com/apache/parquet-format/blob/3478990aa74ffe1c76ab53e7ed8cf8f4f609c7a3/LogicalTypes.md?plain=1#L594,
 which is too strict. I have changed it to check for only non-legacy list.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to