Daniel Solow created NIFI-5735:
----------------------------------
Summary: Record-oriented processors/services do not properly
support Avro Unions
Key: NIFI-5735
URL: https://issues.apache.org/jira/browse/NIFI-5735
Project: Apache NiFi
Issue Type: Bug
Components: Core Framework
Affects Versions: 1.7.1
Reporter: Daniel Solow
The [Avro spec|https://avro.apache.org/docs/1.8.2/spec.html#Unions] states:
{quote}Unions may not contain more than one schema with the same type, *except
for the named types* record, fixed and enum. For example, unions containing two
array types or two map types are not permitted, but two types with different
names are permitted. (Names permit efficient resolution when reading and
writing unions.)
{quote}
However record oriented processors/services in Nifi do not support multiple
named types per union. This is a problem, for example, with the following
schema:
{code:javascript}
{
"type": "record",
"name": "root",
"fields": [
{
"name": "children",
"type": {
"type": "array",
"items": [
{
"type": "record",
"name": "left",
"fields": [
{
"name": "f1",
"type": "string"
}
]
},
{
"type": "record",
"name": "right",
"fields": [
{
"name": "f2",
"type": "int"
}
]
}
]
}
}
]
}
{code}
This schema contains a field name "children" which is array of type union. The
union type contains two possible record types. Currently the Nifi avro
utilities will fail to process records of this schema with arrays that contain
both "left" and "right" record types.
I've traced this bug to the [AvroTypeUtils
class|https://github.com/apache/nifi/blob/rel/nifi-1.7.1/nifi-nar-bundles/nifi-extension-utils/nifi-record-utils/nifi-avro-record-utils/src/main/java/org/apache/nifi/avro/AvroTypeUtil.java].
Specifically there are bugs in the convertUnionFieldValue method and in the
buildAvroSchema method. Both of these methods make the assumption that an Avro
union can only contain one child type of each type. As stated in the spec, this
is true for primitive types (including primitive logicalTypes) but not for
named types.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)