[
https://issues.apache.org/jira/browse/NIFI-5735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16659117#comment-16659117
]
Daniel Solow commented on NIFI-5735:
------------------------------------
If nifi wants to support avro unions properly it will probably be necessary to
change how the record-oriented processors behave. For example right now a
sample record in the example schema above is converted into the following by
the JsonRecordSetWriter.
{code:java}
{
"children" : [
{ "f1" : "a" },
{ "f2" : 1 }
]
}
{code}
So the name of the record type name has been erased. A possible format for
keeping the record type name is the following, as produced by avro-cli:
{code:java}
{
"children" : [
{ "left" : { "f1" : "a" } },
{ "right" : { "f2" : 2 } }
]
}
{code}
> Record-oriented processors/services do not properly support Avro Unions
> -----------------------------------------------------------------------
>
> Key: NIFI-5735
> URL: https://issues.apache.org/jira/browse/NIFI-5735
> Project: Apache NiFi
> Issue Type: Bug
> Components: Core Framework, Extensions
> Affects Versions: 1.7.1
> Reporter: Daniel Solow
> Priority: Major
> Labels: AVRO, avro
> Attachments:
> 0001-NIFI-5735-added-preliminary-support-for-union-resolu.patch
>
>
> The [Avro spec|https://avro.apache.org/docs/1.8.2/spec.html#Unions] states:
> {quote}Unions may not contain more than one schema with the same type,
> *except for the named types* record, fixed and enum. For example, unions
> containing two array types or two map types are not permitted, but two types
> with different names are permitted. (Names permit efficient resolution when
> reading and writing unions.)
> {quote}
> However record oriented processors/services in Nifi do not support multiple
> named types per union. This is a problem, for example, with the following
> schema:
> {code:javascript}
> {
> "type": "record",
> "name": "root",
> "fields": [
> {
> "name": "children",
> "type": {
> "type": "array",
> "items": [
> {
> "type": "record",
> "name": "left",
> "fields": [
> {
> "name": "f1",
> "type": "string"
> }
> ]
> },
> {
> "type": "record",
> "name": "right",
> "fields": [
> {
> "name": "f2",
> "type": "int"
> }
> ]
> }
> ]
> }
> }
> ]
> }
> {code}
> This schema contains a field name "children" which is array of type union.
> The union type contains two possible record types. Currently the Nifi avro
> utilities will fail to process records of this schema with "children" arrays
> that contain both "left" and "right" record types.
> I've traced this bug to the [AvroTypeUtils
> class|https://github.com/apache/nifi/blob/rel/nifi-1.7.1/nifi-nar-bundles/nifi-extension-utils/nifi-record-utils/nifi-avro-record-utils/src/main/java/org/apache/nifi/avro/AvroTypeUtil.java].
> Specifically there are bugs in the convertUnionFieldValue method and in the
> buildAvroSchema method. Both of these methods make the assumption that an
> Avro union can only contain one child type of each type. As stated in the
> spec, this is true for primitive types and non-named complex types but not
> for named types.
> There may be related bugs elsewhere, but I haven't been able to locate them
> yet.
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)