[ 
https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17611705#comment-17611705
 ] 

ASF GitHub Bot commented on PARQUET-1711:
-----------------------------------------

jinyius commented on code in PR #995:
URL: https://github.com/apache/parquet-mr/pull/995#discussion_r984838003


##########
parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java:
##########
@@ -124,35 +164,61 @@ private <T> Builder<? extends Builder<?, 
GroupBuilder<T>>, GroupBuilder<T>> addR
         .named("list");
   }
 
-  private <T> GroupBuilder<GroupBuilder<T>> addRepeatedMessage(FieldDescriptor 
descriptor, GroupBuilder<T> builder) {
-    GroupBuilder<GroupBuilder<GroupBuilder<GroupBuilder<T>>>> result =
-      builder
+  private <T> GroupBuilder<GroupBuilder<T>> addRepeatedMessage(FieldDescriptor 
descriptor, GroupBuilder<T> builder, ImmutableSetMultimap<String, Integer> 
seen, int depth) {
+    GroupBuilder<GroupBuilder<GroupBuilder<GroupBuilder<T>>>> result = builder
         .group(Type.Repetition.OPTIONAL).as(listType())
         .group(Type.Repetition.REPEATED)
         .group(Type.Repetition.OPTIONAL);
 
-    convertFields(result, descriptor.getMessageType().getFields());
+    convertFields(result, descriptor.getMessageType().getFields(), seen, 
depth);
 
     return result.named("element").named("list");
   }
 
-  private <T> GroupBuilder<GroupBuilder<T>> addMessageField(FieldDescriptor 
descriptor, final GroupBuilder<T> builder) {
+  private <T> Builder<? extends Builder<?, GroupBuilder<T>>, GroupBuilder<T>> 
addMessageField(FieldDescriptor descriptor, final GroupBuilder<T> builder, 
ImmutableSetMultimap<String, Integer> seen, int depth) {
+    // Prevent recursion by terminating with optional proto bytes.
+    depth += 1;
+    String typeName = getInnerTypeName(descriptor);
+    LOG.trace("addMessageField: " + descriptor.getFullName() + " type: " + 
typeName + " depth: " + depth);
+    if (typeName != null) {
+      if (seen.get(typeName).size() > maxRecursion) {
+        return builder.primitive(BINARY, 
Type.Repetition.OPTIONAL).as((LogicalTypeAnnotation) null);
+      }
+    }
+
     if (descriptor.isMapField() && parquetSpecsCompliant) {
       // the old schema style did not include the MAP wrapper around map groups
-      return addMapField(descriptor, builder);
+      return addMapField(descriptor, builder, seen, depth);
     }
+
+    seen = ImmutableSetMultimap.<String, 
Integer>builder().putAll(seen).put(typeName, depth).build();

Review Comment:
   it's actually not as costly as you think.  guava's immutable structures are 
written to simply remove method access what not needed, and takes tries its 
best to avoid memory reallocations when using copyOf or builder patterns 
[[1](https://github.com/google/guava/wiki/ImmutableCollectionsExplained)][[2](https://github.com/google/guava/blob/master/guava/src/com/google/common/collect/ImmutableSetMultimap.java#L365)][[3](https://github.com/google/guava/blob/master/guava/src/com/google/common/collect/ImmutableSetMultimap.java#L306)][[4](https://github.com/google/guava/blob/master/guava/src/com/google/common/collect/ImmutableSetMultimap.java#L291)]....
 
[generally](https://stackoverflow.com/questions/1284727/mutable-or-immutable-class).
  it's pretty 
[efficient](https://github.com/DimitrisAndreou/memory-measurer/blob/master/ElementCostInDataStructures.txt).
  because of depth first traversal, we do want to "go back" and let the 
previous state of counts start again as the basis for other branch traversals.  
this is exactly the benefit as it helps in avoiding defensive copying of 
mutable data structures or clearing of fields trying to use a single instance 
when traversing and going back up the stack.





> [parquet-protobuf] stack overflow when work with well known json type
> ---------------------------------------------------------------------
>
>                 Key: PARQUET-1711
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1711
>             Project: Parquet
>          Issue Type: Bug
>    Affects Versions: 1.10.1
>            Reporter: Lawrence He
>            Priority: Major
>
> Writing following protobuf message as parquet file is not possible: 
> {code:java}
> syntax = "proto3";
> import "google/protobuf/struct.proto";
> package test;
> option java_outer_classname = "CustomMessage";
> message TestMessage {
>     map<string, google.protobuf.ListValue> data = 1;
> } {code}
> Protobuf introduced "well known json type" such like 
> [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue]
>  to work around json schema conversion. 
> However writing above messages traps parquet writer into an infinite loop due 
> to the "general type" support in protobuf. Current implementation will keep 
> referencing 6 possible types defined in protobuf (null, bool, number, string, 
> struct, list) and entering infinite loop when referencing "struct".
> {code:java}
> java.lang.StackOverflowErrorjava.lang.StackOverflowError at 
> java.base/java.util.Arrays$ArrayItr.<init>(Arrays.java:4418) at 
> java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at 
> java.base/java.util.Collections$UnmodifiableCollection$1.<init>(Collections.java:1044)
>  at 
> java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to