[
https://issues.apache.org/jira/browse/PARQUET-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gang Wu resolved PARQUET-2425.
------------------------------
Fix Version/s: 1.14.0
Assignee: Claire McGinty
Resolution: Fixed
> AvroSchemaConverter doesn't support non-grouped repeated fields
> ---------------------------------------------------------------
>
> Key: PARQUET-2425
> URL: https://issues.apache.org/jira/browse/PARQUET-2425
> Project: Parquet
> Issue Type: Improvement
> Reporter: Claire McGinty
> Assignee: Claire McGinty
> Priority: Major
> Fix For: 1.14.0
>
>
> Currently AvroSchemaConverter#convert does not support Parquet-to-Avro
> conversions where the Parquet schema contains a non-grouped repeated type.
> For example, this operation:
>
> new AvroSchemaConverter()
> .convert(MessageTypeParser.parseMessageType(
> "message MySchema \{ repeated int32 repeatedField; }"
> ))
>
> triggers an UnsupportedOperationException("REPEATED not supported outside
> LIST or MAP"):
> https://github.com/apache/parquet-mr/blob/apache-parquet-1.13.1/parquet-avro/src/main/java/org/apache/parquet/avro/AvroSchemaConverter.java#L292
>
> However, if I'm interpreting the format spec correctly
> ([https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#nested-types]),
> ungrouped repeated types should be treated as REQUIRED:
> > This does not affect repeated fields that are not annotated: A repeated
> > field that is neither contained by a {{{}LIST{}}}- or {{{}MAP{}}}-annotated
> > group nor annotated by {{LIST}} or {{MAP}} should be interpreted as a
> > required list of required elements where the element type is the type of
> > the field.
> If this interpretation is correct, can we update AvroSchemaConverter to
> handle this use case? I'll put up a PR demonstrating it.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]