Claire McGinty created PARQUET-2425:
---------------------------------------

             Summary: AvroSchemaConverter doesn't support non-grouped repeated 
fields
                 Key: PARQUET-2425
                 URL: https://issues.apache.org/jira/browse/PARQUET-2425
             Project: Parquet
          Issue Type: Improvement
            Reporter: Claire McGinty


Currently AvroSchemaConverter#convert does not support Parquet-to-Avro 
conversions where the Parquet schema contains a non-grouped repeated type. For 
example, this operation:
 

new AvroSchemaConverter()

   .convert(MessageTypeParser.parseMessageType(

     "message MySchema \{ repeated int32 repeatedField; }"

   ))
 

triggers an UnsupportedOperationException("REPEATED not supported outside LIST 
or MAP"): 
https://github.com/apache/parquet-mr/blob/apache-parquet-1.13.1/parquet-avro/src/main/java/org/apache/parquet/avro/AvroSchemaConverter.java#L292
 

However, if I'm interpreting the format spec correctly 
([https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#nested-types]),
 ungrouped repeated types should be treated as REQUIRED:


> This does not affect repeated fields that are not annotated: A repeated field 
> that is neither contained by a {{{}LIST{}}}- or {{{}MAP{}}}-annotated group 
> nor annotated by {{LIST}} or {{MAP}} should be interpreted as a required list 
> of required elements where the element type is the type of the field.


If this interpretation is correct, can we update AvroSchemaConverter to handle 
this use case? I'll put up a PR demonstrating it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to