Charlie Meyer created NIFI-4551:
-----------------------------------
Summary: JSON to Avro conversion fails for records which have
nested records
Key: NIFI-4551
URL: https://issues.apache.org/jira/browse/NIFI-4551
Project: Apache NiFi
Issue Type: Bug
Components: Core Framework
Affects Versions: 1.4.0
Reporter: Charlie Meyer
Attachments: ExampleObject.avsc, examplePayload.avro,
examplePayload.json, example_object.avdl, nifi_json_avro_bug.xml,
schema_registry_payload.json
JSON to Avro conversion fails for records which have nested records.
Given a confluent schema registry exists at some accessible address
Steps to recreate:
# register the schema:
{{$ curl -H "Content-Type: application/vnd.schemaregistry.v1+json" -d
@schema_registry_payload.json 4.3.2.1:8081/subjects/nifiBug/versions | jq}}
# verify that we can use that schema to convert json to and from avro
{{$ avro-tools fromjson --schema-file ExampleObject.avsc examplePayload.json >
examplePayload.avro
$ avro-tools tojson examplePayload.avro | jq}}
# apply the attached template to nifi: nifi_avro_bug.xml
# start up the components that the template created in nifi
run the following command:
{{$ curl -X POST -d @examplePayload.json http://localhost:5001/ | jq}}
The serialization to avro fails with the following stack trace:
{{
2017-10-30 11:41:02,199 ERROR [Timer-Driven Process Thread-5]
o.a.n.p.k.pubsub.PublishKafkaRecord_0_10
PublishKafkaRecord_0_10[id=19a933c0-f766-1221-4373-21c102ff71ab] Failed to send
all message for
StandardFlowFileRecord[uuid=4834f5cb-f513-49ee-8c3e-305a3acc64b6,claim=StandardContentClaim
[resourceClaim=StandardResourceClaim[id=1509378326140-1, container=default,
section=1], offset=4297, length=156],offset=0,name=75094273920075,size=156] to
Kafka; routing to failure due to
org.apache.avro.file.DataFileWriter$AppendWriteException:
java.lang.NullPointerException: null of string in field name of
com.example.SubtypeA of union in field payload of com.example.ExampleObject: {}
org.apache.avro.file.DataFileWriter$AppendWriteException:
java.lang.NullPointerException: null of string in field name of
com.example.SubtypeA of union in field payload of com.example.ExampleObject
at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:308)
at
org.apache.nifi.avro.WriteAvroResultWithSchema.writeRecord(WriteAvroResultWithSchema.java:61)
at
org.apache.nifi.serialization.AbstractRecordSetWriter.write(AbstractRecordSetWriter.java:59)
at
org.apache.nifi.processors.kafka.pubsub.PublisherLease.publish(PublisherLease.java:114)
at
org.apache.nifi.processors.kafka.pubsub.PublishKafkaRecord_0_10$1.process(PublishKafkaRecord_0_10.java:339)
at
org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2174)
at
org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2144)
at
org.apache.nifi.processors.kafka.pubsub.PublishKafkaRecord_0_10.onTrigger(PublishKafkaRecord_0_10.java:331)
at
org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at
org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1119)
at
org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:147)
at
org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47)
at
org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:128)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException: null of string in field name of
com.example.SubtypeA of union in field payload of com.example.ExampleObject
at
org.apache.avro.generic.GenericDatumWriter.npe(GenericDatumWriter.java:132)
at
org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:126)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:60)
at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:302)
... 19 common frames omitted
Caused by: java.lang.NullPointerException: null
at org.apache.avro.io.Encoder.writeString(Encoder.java:121)
at
org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:254)
at
org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:249)
at
org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:115)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73)
at
org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:153)
at
org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:143)
at
org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:105)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73)
at
org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:112)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73)
at
org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:153)
at
org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:143)
at
org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:105)
... 22 common frames omitted}}
I did a bit of digging on this one and had a few observations:
When writing to avro, the following code is run to generate the avro record:
[https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-extension-utils/nifi-record-utils/nifi-avro-record-utils/src/main/java/org/apache/nifi/avro/AvroTypeUtil.java#L558]
Here it iterates over all the fields of the object. This same code appears to
be excuted on thee nested record. When run on the nested record, the schema on
it has an empty list of fields. Thus, when the avro is generated, it has null
values for all fields on the nested record.
This appears to be being set here:
[https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/json/AbstractJsonRowRecordReader.java#L162]
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)