[
https://issues.apache.org/jira/browse/NIFI-8154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17268869#comment-17268869
]
Glenn Jones commented on NIFI-8154:
-----------------------------------
The test fails because it expects a field in the Record produced by
ConvertAvroToParquet to be named "map", but it is actually named "key_value".
In parquet-avro 1.10.0, AvroParquetWriter produces parquet with a schema that
includes the following definition for the mymap field from the test avro:
required group mymap (MAP) {
repeated group map (MAP_KEY_VALUE) {
required binary key (UTF8);
required int32 value;
}
}
This doesn't conform to the Map logical type, but it is within the [backward
compatibility
rules|https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#backward-compatibility-rules-1]
In parquet-avro 1.11.1, AvroParquetWriter produces the following which I think
is more correct (the middle level is named "key_value" instead of "map")
required group mymap (MAP) {
repeated group key_value (MAP_KEY_VALUE) {
required binary key (STRING);
required int32 value;
}
}
The test uses GroupReadSupport to read the parquet into something it can
examine and as a result the middle level group name has changed from "map" to
"key_value". I doubt that other ReadSupport implementations would expose the
name of the middle level group in this way, so perhaps this wouldn't have been
an issue if the tests had used AvroReadSupport. In any case, I think it's fine
to simply update the tests to expect the field names from the 1.11.1
AvroParquetWriter.
> AvroParquetHDFSRecordReader fails to read parquet file containing nested
> structs
> --------------------------------------------------------------------------------
>
> Key: NIFI-8154
> URL: https://issues.apache.org/jira/browse/NIFI-8154
> Project: Apache NiFi
> Issue Type: Bug
> Components: Extensions
> Affects Versions: 1.11.3, 1.12.1
> Reporter: Glenn Jones
> Priority: Minor
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> FetchParquet can't be used to process files containing nested structs. When
> trying to create a RecordSchema it runs into
> https://issues.apache.org/jira/browse/PARQUET-1441, which causes it to fail.
> We've patched this locally by building the nifi-parquet-processors with
> parquet-avro 1.11.0, but it would be great if this made it into the next
> release.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)