I have a Protocol buffer defined as such:

message A
{
                optional string id = 1;
                repeated B  extension = 2;
}

The B message is:

Message B
{
                optional string id = 1;
                repeated B  extension = 2;
}

The self referencing message "B" causes a recursrive infinite loop when trying 
to write an object of type A to parquet:

        public void writeMessages(Class<? extends Message> cls, Path file, 
List<MessageOrBuilder> records)
                        throws IOException {

                ParquetWriter writer = new ProtoParquetWriter(  file, cls);

                try {
                        for (MessageOrBuilder record : records) {
                                writer.write(record);
                        }
                } finally {
                        writer.close();
                }
        }

Message objects without the self-referencing fields write with errors.  The 
recursive loop occurs during the field discovery from the class.  Here is  a 
stack trace from a spark-shell run:
Exception in thread "main" java.lang.StackOverflowError
        at java.util.HashMap.inflateTable(HashMap.java:317)
        at java.util.HashMap.put(HashMap.java:488)
        at org.apache.parquet.schema.GroupType.<init>(GroupType.java:97)
        at 
org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:624)
        at 
org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:497)
        at org.apache.parquet.schema.Types$Builder.named(Types.java:286)
        at 
org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:67)
        at 
org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:98)
        at 
org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:67)
        at 
org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:98)
        at 
org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:67)
        at 
org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:98)
        at 
org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:67)
<...  repeats until stack failure ...>

Is this a known issue or is some option to pass to the ProtoParquetWriter?  I 
haven't seen anything obvious

Thanks


James McCudden
Architect
Relay Health Intelligence

413.587.6819 Office
413.835.5441 Mobile

RelayHealth

A division of McKesson

Confidentiality Notice: This e-mail message, including any attachments, is for 
the sole use of the intended recipient(s) and may contain confidential and 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited. If you are not the intended recipient, please 
contact the sender by reply e-mail and destroy all copies of the original 
message

Reply via email to