[ 
https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17583074#comment-17583074
 ] 

ASF GitHub Bot commented on PARQUET-1711:
-----------------------------------------

matthieun commented on code in PR #988:
URL: https://github.com/apache/parquet-mr/pull/988#discussion_r951644453


##########
parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java:
##########
@@ -79,12 +80,20 @@ public MessageType convert(Class<? extends Message> 
protobufClass) {
   }
 
   /* Iterates over list of fields. **/
-  private <T> GroupBuilder<T> convertFields(GroupBuilder<T> groupBuilder, 
List<FieldDescriptor> fieldDescriptors) {
+  private <T> GroupBuilder<T> convertFields(GroupBuilder<T> groupBuilder, 
List<FieldDescriptor> fieldDescriptors, List<String> parentNames) {
     for (FieldDescriptor fieldDescriptor : fieldDescriptors) {
-      groupBuilder =
-          addField(fieldDescriptor, groupBuilder)
+      final String name = fieldDescriptor.getFullName();
+      final List<String> newParentNames = new ArrayList<>(parentNames);
+      newParentNames.add(name);
+      if (parentNames.contains(name)) {

Review Comment:
   The list is mostly used to keep the ordering, so that the dependency chain 
is printed in order in the warning message. I understand that in case the 
schema definition is really deep with nested types it might be slower, but 
overall that list is not growing any bigger than the deepest nesting in the 
schema.
   If this is still a concern, I am happy to switch to HashSet at the expense 
of maybe dumming down the log message (printing the nesting chain out of order 
would not be valuable anyway I think).





> [parquet-protobuf] stack overflow when work with well known json type
> ---------------------------------------------------------------------
>
>                 Key: PARQUET-1711
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1711
>             Project: Parquet
>          Issue Type: Bug
>    Affects Versions: 1.10.1
>            Reporter: Lawrence He
>            Priority: Major
>
> Writing following protobuf message as parquet file is not possible: 
> {code:java}
> syntax = "proto3";
> import "google/protobuf/struct.proto";
> package test;
> option java_outer_classname = "CustomMessage";
> message TestMessage {
>     map<string, google.protobuf.ListValue> data = 1;
> } {code}
> Protobuf introduced "well known json type" such like 
> [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue]
>  to work around json schema conversion. 
> However writing above messages traps parquet writer into an infinite loop due 
> to the "general type" support in protobuf. Current implementation will keep 
> referencing 6 possible types defined in protobuf (null, bool, number, string, 
> struct, list) and entering infinite loop when referencing "struct".
> {code:java}
> java.lang.StackOverflowErrorjava.lang.StackOverflowError at 
> java.base/java.util.Arrays$ArrayItr.<init>(Arrays.java:4418) at 
> java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at 
> java.base/java.util.Collections$UnmodifiableCollection$1.<init>(Collections.java:1044)
>  at 
> java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to