Claire McGinty created PARQUET-2425:
---------------------------------------
Summary: AvroSchemaConverter doesn't support non-grouped repeated
fields
Key: PARQUET-2425
URL: https://issues.apache.org/jira/browse/PARQUET-2425
Project: Parquet
Issue Type: Improvement
Reporter: Claire McGinty
Currently AvroSchemaConverter#convert does not support Parquet-to-Avro
conversions where the Parquet schema contains a non-grouped repeated type. For
example, this operation:
new AvroSchemaConverter()
.convert(MessageTypeParser.parseMessageType(
"message MySchema \{ repeated int32 repeatedField; }"
))
triggers an UnsupportedOperationException("REPEATED not supported outside LIST
or MAP"):
https://github.com/apache/parquet-mr/blob/apache-parquet-1.13.1/parquet-avro/src/main/java/org/apache/parquet/avro/AvroSchemaConverter.java#L292
However, if I'm interpreting the format spec correctly
([https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#nested-types]),
ungrouped repeated types should be treated as REQUIRED:
> This does not affect repeated fields that are not annotated: A repeated field
> that is neither contained by a {{{}LIST{}}}- or {{{}MAP{}}}-annotated group
> nor annotated by {{LIST}} or {{MAP}} should be interpreted as a required list
> of required elements where the element type is the type of the field.
If this interpretation is correct, can we update AvroSchemaConverter to handle
this use case? I'll put up a PR demonstrating it.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]