I have a Protocol buffer defined as such:
message A
{
optional string id = 1;
repeated B extension = 2;
}
The B message is:
Message B
{
optional string id = 1;
repeated B extension = 2;
}
The self referencing message "B" causes a recursrive infinite loop when trying
to write an object of type A to parquet:
public void writeMessages(Class<? extends Message> cls, Path file,
List<MessageOrBuilder> records)
throws IOException {
ParquetWriter writer = new ProtoParquetWriter( file, cls);
try {
for (MessageOrBuilder record : records) {
writer.write(record);
}
} finally {
writer.close();
}
}
Message objects without the self-referencing fields write with errors. The
recursive loop occurs during the field discovery from the class. Here is a
stack trace from a spark-shell run:
Exception in thread "main" java.lang.StackOverflowError
at java.util.HashMap.inflateTable(HashMap.java:317)
at java.util.HashMap.put(HashMap.java:488)
at org.apache.parquet.schema.GroupType.<init>(GroupType.java:97)
at
org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:624)
at
org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:497)
at org.apache.parquet.schema.Types$Builder.named(Types.java:286)
at
org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:67)
at
org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:98)
at
org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:67)
at
org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:98)
at
org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:67)
at
org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:98)
at
org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:67)
<... repeats until stack failure ...>
Is this a known issue or is some option to pass to the ProtoParquetWriter? I
haven't seen anything obvious
Thanks
James McCudden
Architect
Relay Health Intelligence
413.587.6819 Office
413.835.5441 Mobile
RelayHealth
A division of McKesson
Confidentiality Notice: This e-mail message, including any attachments, is for
the sole use of the intended recipient(s) and may contain confidential and
privileged information. Any unauthorized review, use, disclosure or
distribution is prohibited. If you are not the intended recipient, please
contact the sender by reply e-mail and destroy all copies of the original
message