emkornfield commented on code in PR #43995:
URL: https://github.com/apache/arrow/pull/43995#discussion_r1823146507
##########
cpp/src/parquet/arrow/schema.cc:
##########
@@ -651,45 +653,65 @@ Status ListToSchemaField(const GroupNode& group,
LevelInfo current_levels,
int16_t repeated_ancestor_def_level = current_levels.IncrementRepeated();
if (list_node.is_group()) {
- // Resolve 3-level encoding
- //
- // required/optional group name=whatever {
- // repeated group name=list {
- // required/optional TYPE item;
- // }
- // }
- //
- // yields list<item: TYPE ?nullable> ?nullable
- //
- // We distinguish the special case that we have
- //
- // required/optional group name=whatever {
- // repeated group name=array or $SOMETHING_tuple {
- // required/optional TYPE item;
- // }
- // }
- //
- // In this latter case, the inner type of the list should be a struct
- // rather than a primitive value
- //
- // yields list<item: struct<item: TYPE ?nullable> not null> ?nullable
const auto& list_group = static_cast<const GroupNode&>(list_node);
- // Special case mentioned in the format spec:
- // If the name is array or ends in _tuple, this should be a list of
struct
- // even for single child elements.
- if (list_group.field_count() == 1 && !HasStructListName(list_group)) {
- // List of primitive type
- RETURN_NOT_OK(
- NodeToSchemaField(*list_group.field(0), current_levels, ctx, out,
child_field));
- } else {
+ if (list_group.field_count() > 1) {
+ // The inner type of the list should be a struct when there are multiple
fields
+ // in the repeated group
RETURN_NOT_OK(GroupToStruct(list_group, current_levels, ctx, out,
child_field));
+ } else if (list_group.field_count() == 1) {
+ const auto& repeated_field = list_group.field(0);
+ if (repeated_field->is_repeated()) {
Review Comment:
Thanks for the careful explanation on my last questions, after rereading I
think I mostly agree that this is a bug that needs to be fixed. I think I
missed the second (LIST) annotation. Which even though this check corresponds
to the java, code, it seems an important factor here is that the list_group is
in fact a (LIST and maybe even possibly a map), not that the inner element is
repeated?
So I think the logic might make more sense as (pseudocode):
```
if (list_group.field_count() > 1) {
...
} else if (HasListElementName(list_group, group)) {
if (IsMap(list_group) || IsList(list_group)) {
RETURN_NOT_OK(
NodeToSchemaField(list_group, current_levels, ctx, out,
child_field));
} else {
RETURN_NOT_OK(GroupToStruct(list_group, current_levels, ctx, out,
child_field));
}
} else {
...
}
```
Does this formulation work?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]