[ 
https://issues.apache.org/jira/browse/NIFI-5735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16675735#comment-16675735
 ] 

Alex Savitsky edited comment on NIFI-5735 at 11/6/18 1:36 PM:
--------------------------------------------------------------

Attached is a patch against the master NiFi branch that fixes the issue.

General idea: convertToAvroObject now returns a pair of the original conversion 
result and the number of fields that failed the conversion for the underlying 
record type, if any (0 otherwise).

The only place where the second pair element is used, is in the lambda passed 
to convertUnionFieldValue.

Instead of simply returning the converted Avro object, the lambda now inspects 
the number of failed fields, throwing an exception if this number is not zero.

This signals the schema conversion error to the caller, allowing 
convertUnionFieldValue to continue iterating union schemas, until one is found 
that has all the fields recognized.

[^NIFI-5735.patch]


was (Author: alex_savitsky):
Attached is a patch against the master NiFi branch that fixes the issue. 
General idea: convertToAvroObject now returns a pair of the original conversion 
result and the number of fields that failed the conversion for the underlying 
record type, if any (0 otherwise). The only place where the second pair element 
is used, is in the lambda passed to convertUnionFieldValue. Instead of simply 
returning the converted Avro object, the lambda now inspects the number of 
failed fields, throwing an exception if this number is not zero. This signals 
the schema conversion error to the caller, allowing convertUnionFieldValue to 
continue iterating union schemas, until one is found that has all the fields 
recognized.

[^NIFI-5735.patch]

> Record-oriented processors/services do not properly support Avro Unions
> -----------------------------------------------------------------------
>
>                 Key: NIFI-5735
>                 URL: https://issues.apache.org/jira/browse/NIFI-5735
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework, Extensions
>    Affects Versions: 1.7.1
>            Reporter: Daniel Solow
>            Priority: Major
>              Labels: AVRO, avro
>         Attachments: 
> 0001-NIFI-5735-added-preliminary-support-for-union-resolu.patch, 
> NIFI-5735.patch
>
>
> The [Avro spec|https://avro.apache.org/docs/1.8.2/spec.html#Unions] states:
> {quote}Unions may not contain more than one schema with the same type, 
> *except for the named types* record, fixed and enum. For example, unions 
> containing two array types or two map types are not permitted, but two types 
> with different names are permitted. (Names permit efficient resolution when 
> reading and writing unions.)
> {quote}
> However record oriented processors/services in Nifi do not support multiple 
> named types per union. This is a problem, for example, with the following 
> schema:
> {code:javascript}
> {
>     "type": "record",
>     "name": "root",
>     "fields": [
>         {
>             "name": "children",
>             "type": {
>                 "type": "array",
>                 "items": [
>                     {
>                         "type": "record",
>                         "name": "left",
>                         "fields": [
>                             {
>                                 "name": "f1",
>                                 "type": "string"
>                             }
>                         ]
>                     },
>                     {
>                         "type": "record",
>                         "name": "right",
>                         "fields": [
>                             {
>                                 "name": "f2",
>                                 "type": "int"
>                             }
>                         ]
>                     }
>                 ]
>             }
>         }
>     ]
> }
> {code}
>  This schema contains a field name "children" which is array of type union. 
> The union type contains two possible record types. Currently the Nifi avro 
> utilities will fail to process records of this schema with "children" arrays 
> that contain both "left" and "right" record types.
> I've traced this bug to the [AvroTypeUtils 
> class|https://github.com/apache/nifi/blob/rel/nifi-1.7.1/nifi-nar-bundles/nifi-extension-utils/nifi-record-utils/nifi-avro-record-utils/src/main/java/org/apache/nifi/avro/AvroTypeUtil.java].
> Specifically there are bugs in the convertUnionFieldValue method and in the 
> buildAvroSchema method. Both of these methods make the assumption that an 
> Avro union can only contain one child type of each type. As stated in the 
> spec, this is true for primitive types and non-named complex types but not 
> for named types.
>  There may be related bugs elsewhere, but I haven't been able to locate them 
> yet.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to