[ 
https://issues.apache.org/jira/browse/NIFI-5735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Solow updated NIFI-5735:
-------------------------------
    Description: 
The [Avro spec|https://avro.apache.org/docs/1.8.2/spec.html#Unions] states:
{quote}Unions may not contain more than one schema with the same type, *except 
for the named types* record, fixed and enum. For example, unions containing two 
array types or two map types are not permitted, but two types with different 
names are permitted. (Names permit efficient resolution when reading and 
writing unions.)
{quote}
However record oriented processors/services in Nifi do not support multiple 
named types per union. This is a problem, for example, with the following 
schema:
{code:javascript}
{
    "type": "record",
    "name": "root",
    "fields": [
        {
            "name": "children",
            "type": {
                "type": "array",
                "items": [
                    {
                        "type": "record",
                        "name": "left",
                        "fields": [
                            {
                                "name": "f1",
                                "type": "string"
                            }
                        ]
                    },
                    {
                        "type": "record",
                        "name": "right",
                        "fields": [
                            {
                                "name": "f2",
                                "type": "int"
                            }
                        ]
                    }
                ]
            }
        }
    ]
}
{code}
 This schema contains a field name "children" which is array of type union. The 
union type contains two possible record types. Currently the Nifi avro 
utilities will fail to process records of this schema with "children" arrays 
that contain both "left" and "right" record types.

I've traced this bug to the [AvroTypeUtils 
class|https://github.com/apache/nifi/blob/rel/nifi-1.7.1/nifi-nar-bundles/nifi-extension-utils/nifi-record-utils/nifi-avro-record-utils/src/main/java/org/apache/nifi/avro/AvroTypeUtil.java].

Specifically there are bugs in the convertUnionFieldValue method and in the 
buildAvroSchema method. Both of these methods make the assumption that an Avro 
union can only contain one child type of each type. As stated in the spec, this 
is true for primitive types (including primitive logicalTypes) but not for 
named types.

 There may be related bugs elsewhere, but I haven't been able to locate them 
yet.

 

 

  was:
The [Avro spec|https://avro.apache.org/docs/1.8.2/spec.html#Unions] states:
{quote}Unions may not contain more than one schema with the same type, *except 
for the named types* record, fixed and enum. For example, unions containing two 
array types or two map types are not permitted, but two types with different 
names are permitted. (Names permit efficient resolution when reading and 
writing unions.)
{quote}
However record oriented processors/services in Nifi do not support multiple 
named types per union. This is a problem, for example, with the following 
schema:
{code:javascript}
{
    "type": "record",
    "name": "root",
    "fields": [
        {
            "name": "children",
            "type": {
                "type": "array",
                "items": [
                    {
                        "type": "record",
                        "name": "left",
                        "fields": [
                            {
                                "name": "f1",
                                "type": "string"
                            }
                        ]
                    },
                    {
                        "type": "record",
                        "name": "right",
                        "fields": [
                            {
                                "name": "f2",
                                "type": "int"
                            }
                        ]
                    }
                ]
            }
        }
    ]
}
{code}
 This schema contains a field name "children" which is array of type union. The 
union type contains two possible record types. Currently the Nifi avro 
utilities will fail to process records of this schema with arrays that contain 
both "left" and "right" record types.

I've traced this bug to the [AvroTypeUtils 
class|https://github.com/apache/nifi/blob/rel/nifi-1.7.1/nifi-nar-bundles/nifi-extension-utils/nifi-record-utils/nifi-avro-record-utils/src/main/java/org/apache/nifi/avro/AvroTypeUtil.java].

Specifically there are bugs in the convertUnionFieldValue method and in the 
buildAvroSchema method. Both of these methods make the assumption that an Avro 
union can only contain one child type of each type. As stated in the spec, this 
is true for primitive types (including primitive logicalTypes) but not for 
named types.

 There may be related bugs elsewhere, but I haven't been able to locate them 
yet.

 

 


> Record-oriented processors/services do not properly support Avro Unions
> -----------------------------------------------------------------------
>
>                 Key: NIFI-5735
>                 URL: https://issues.apache.org/jira/browse/NIFI-5735
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.7.1
>            Reporter: Daniel Solow
>            Priority: Major
>              Labels: AVRO, avro
>
> The [Avro spec|https://avro.apache.org/docs/1.8.2/spec.html#Unions] states:
> {quote}Unions may not contain more than one schema with the same type, 
> *except for the named types* record, fixed and enum. For example, unions 
> containing two array types or two map types are not permitted, but two types 
> with different names are permitted. (Names permit efficient resolution when 
> reading and writing unions.)
> {quote}
> However record oriented processors/services in Nifi do not support multiple 
> named types per union. This is a problem, for example, with the following 
> schema:
> {code:javascript}
> {
>     "type": "record",
>     "name": "root",
>     "fields": [
>         {
>             "name": "children",
>             "type": {
>                 "type": "array",
>                 "items": [
>                     {
>                         "type": "record",
>                         "name": "left",
>                         "fields": [
>                             {
>                                 "name": "f1",
>                                 "type": "string"
>                             }
>                         ]
>                     },
>                     {
>                         "type": "record",
>                         "name": "right",
>                         "fields": [
>                             {
>                                 "name": "f2",
>                                 "type": "int"
>                             }
>                         ]
>                     }
>                 ]
>             }
>         }
>     ]
> }
> {code}
>  This schema contains a field name "children" which is array of type union. 
> The union type contains two possible record types. Currently the Nifi avro 
> utilities will fail to process records of this schema with "children" arrays 
> that contain both "left" and "right" record types.
> I've traced this bug to the [AvroTypeUtils 
> class|https://github.com/apache/nifi/blob/rel/nifi-1.7.1/nifi-nar-bundles/nifi-extension-utils/nifi-record-utils/nifi-avro-record-utils/src/main/java/org/apache/nifi/avro/AvroTypeUtil.java].
> Specifically there are bugs in the convertUnionFieldValue method and in the 
> buildAvroSchema method. Both of these methods make the assumption that an 
> Avro union can only contain one child type of each type. As stated in the 
> spec, this is true for primitive types (including primitive logicalTypes) but 
> not for named types.
>  There may be related bugs elsewhere, but I haven't been able to locate them 
> yet.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to