[ 
https://issues.apache.org/jira/browse/NIFI-5735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Savitsky updated NIFI-5735:
--------------------------------
    Attachment: NIFI-5735.patch

> Record-oriented processors/services do not properly support Avro Unions
> -----------------------------------------------------------------------
>
>                 Key: NIFI-5735
>                 URL: https://issues.apache.org/jira/browse/NIFI-5735
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework, Extensions
>    Affects Versions: 1.7.1
>            Reporter: Daniel Solow
>            Priority: Major
>              Labels: AVRO, avro
>         Attachments: 
> 0001-NIFI-5735-added-preliminary-support-for-union-resolu.patch, 
> NIFI-5735.patch
>
>
> The [Avro spec|https://avro.apache.org/docs/1.8.2/spec.html#Unions] states:
> {quote}Unions may not contain more than one schema with the same type, 
> *except for the named types* record, fixed and enum. For example, unions 
> containing two array types or two map types are not permitted, but two types 
> with different names are permitted. (Names permit efficient resolution when 
> reading and writing unions.)
> {quote}
> However record oriented processors/services in Nifi do not support multiple 
> named types per union. This is a problem, for example, with the following 
> schema:
> {code:javascript}
> {
>     "type": "record",
>     "name": "root",
>     "fields": [
>         {
>             "name": "children",
>             "type": {
>                 "type": "array",
>                 "items": [
>                     {
>                         "type": "record",
>                         "name": "left",
>                         "fields": [
>                             {
>                                 "name": "f1",
>                                 "type": "string"
>                             }
>                         ]
>                     },
>                     {
>                         "type": "record",
>                         "name": "right",
>                         "fields": [
>                             {
>                                 "name": "f2",
>                                 "type": "int"
>                             }
>                         ]
>                     }
>                 ]
>             }
>         }
>     ]
> }
> {code}
>  This schema contains a field name "children" which is array of type union. 
> The union type contains two possible record types. Currently the Nifi avro 
> utilities will fail to process records of this schema with "children" arrays 
> that contain both "left" and "right" record types.
> I've traced this bug to the [AvroTypeUtils 
> class|https://github.com/apache/nifi/blob/rel/nifi-1.7.1/nifi-nar-bundles/nifi-extension-utils/nifi-record-utils/nifi-avro-record-utils/src/main/java/org/apache/nifi/avro/AvroTypeUtil.java].
> Specifically there are bugs in the convertUnionFieldValue method and in the 
> buildAvroSchema method. Both of these methods make the assumption that an 
> Avro union can only contain one child type of each type. As stated in the 
> spec, this is true for primitive types and non-named complex types but not 
> for named types.
>  There may be related bugs elsewhere, but I haven't been able to locate them 
> yet.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to