[
https://issues.apache.org/jira/browse/NIFI-7790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17221645#comment-17221645
]
Pierre Villard commented on NIFI-7790:
--------------------------------------
{noformat}
2020-10-27 19:18:07,954 ERROR [Event-Driven Process Thread-1]
o.a.n.processors.standard.ConvertRecord
ConvertRecord[id=a2eb82b4-ac5c-32f9-062a-194ed4057ecb] Failed to process
StandardFlowFileRecord[uuid=080f5df4-a920-4a82-b99e-8776db105df7,claim=StandardContentClaim
[resourceClaim=StandardResourceClaim[id=1603822546154-1, container=default,
section=1], offset=2956,
length=227],offset=0,name=080f5df4-a920-4a82-b99e-8776db105df7,size=227]; will
route to failure: org.apache.avro.SchemaParseException: Can't redefine:
org.apache.nifi.itemType
org.apache.avro.SchemaParseException: Can't redefine: org.apache.nifi.itemType
at org.apache.avro.Schema$Names.put(Schema.java:1128)
at org.apache.avro.Schema$NamedSchema.writeNameRef(Schema.java:562)
at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:690)
at org.apache.avro.Schema$UnionSchema.toJson(Schema.java:882)
at org.apache.avro.Schema$RecordSchema.fieldsToJson(Schema.java:716)
at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:701)
at org.apache.avro.Schema$UnionSchema.toJson(Schema.java:882)
at org.apache.avro.Schema$RecordSchema.fieldsToJson(Schema.java:716)
at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:701)
at org.apache.avro.Schema.toString(Schema.java:324)
at org.apache.avro.Schema.toString(Schema.java:314)
at org.apache.avro.file.DataFileWriter.create(DataFileWriter.java:144)
at org.apache.avro.file.DataFileWriter.create(DataFileWriter.java:135)
at
org.apache.nifi.avro.WriteAvroResultWithSchema.<init>(WriteAvroResultWithSchema.java:45)
at
org.apache.nifi.avro.AvroRecordSetWriter.createWriter(AvroRecordSetWriter.java:149)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.nifi.controller.service.StandardControllerServiceInvocationHandler.invoke(StandardControllerServiceInvocationHandler.java:254)
at
org.apache.nifi.controller.service.StandardControllerServiceInvocationHandler.invoke(StandardControllerServiceInvocationHandler.java:105)
at com.sun.proxy.$Proxy376.createWriter(Unknown Source)
at
org.apache.nifi.processors.standard.AbstractRecordProcessor$1.process(AbstractRecordProcessor.java:150)
at
org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2988)
at
org.apache.nifi.controller.repository.BatchingSessionFactory$HighThroughputSession.write(BatchingSessionFactory.java:222)
at
org.apache.nifi.processors.standard.AbstractRecordProcessor.onTrigger(AbstractRecordProcessor.java:122)
at
org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at
org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1174)
at
org.apache.nifi.controller.scheduling.EventDrivenSchedulingAgent$EventDrivenTask.trigger(EventDrivenSchedulingAgent.java:354)
at
org.apache.nifi.controller.scheduling.EventDrivenSchedulingAgent$EventDrivenTask.run(EventDrivenSchedulingAgent.java:233)
at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748){noformat}
Your first example is going to generate this schema:
{noformat}
{
"type":"record",
"name":"nifiRecord",
"namespace":"org.apache.nifi",
"fields":[
{
"name":"authors",
"type":[
"null",
{
"type":"record",
"name":"authorsType",
"fields":[
{
"name":"item",
"type":[
"null",
{
"type":"record",
"name":"itemType",
"fields":[
{
"name":"name",
"type":[
"null",
"string"
]
}
]
}
]
}
]
}
]
},
{
"name":"editors",
"type":[
"null",
{
"type":"record",
"name":"editorsType",
"fields":[
{
"name":"item",
"type":[
"null",
"itemType"
]
}
]
}
]
}
]
}{noformat}
In your second example the "item" record is changing and this is not allowed in
the current form of the processor (because we use the same "name" field). What
I would recommend is to provide a schema instead using the schema inference.
The below schema would work with your data for the first example you gave:
{noformat}
{
"type":"record",
"name":"nifiRecord",
"namespace":"org.apache.nifi",
"fields":[
{
"name":"authors",
"type":[
"null",
{
"type":"record",
"name":"authorsType",
"fields":[
{
"name":"item",
"type":[
"null",
{
"type":"record",
"name":"itemType1",
"fields":[
{
"name":"name",
"type":[
"null",
"string"
]
}
]
}
]
}
]
}
]
},
{
"name":"editors",
"type":[
"null",
{
"type":"record",
"name":"editorsType",
"fields":[
{
"name":"item",
"type":[
"null",
{
"type":"record",
"name":"itemType2",
"fields":[
{
"name":"commercialName",
"type":[
"null",
"string"
]
}
]
}
]
}
]
}
]
}
]
}{noformat}
We could definitely improve things though to support such cases but overall
it's always better to provide a schema.
> XML record reader - failure on well-formed XML
> ----------------------------------------------
>
> Key: NIFI-7790
> URL: https://issues.apache.org/jira/browse/NIFI-7790
> Project: Apache NiFi
> Issue Type: Bug
> Components: Extensions
> Affects Versions: 1.11.4
> Reporter: Pierre Gramme
> Priority: Major
> Labels: records, xml
> Attachments: bug-parse-xml.xml
>
>
> I am using ConvertRecord in order to parse XML flowfiles to Avro, with the
> Infer Schema strategy. Some input flowfiles are sent to the failure output
> queue whereas they are well-formed:
> {code:java}
> <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
> <root>
> <authors>
> <item>
> <name>Neil Gaiman</name>
> </item>
> </authors>
> <editors>
> <item>
> <commercialName>Hachette</commercialName>
> </item>
> </editors>
> </root>
> {code}
> Note the use of authors/item/name on one side, and
> editors/item/commercialName on the other side.
> On the other hand, this gets correctly parsed:
> {code:java}
> <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
> <root>
> <authors>
> <item>
> <name>Neil Gaiman</name>
> </item>
> </authors>
> <editors>
> <item>
> <name>Hachette</name>
> </item>
> </editors>
> </root>
> {code}
> See the attached template for minimal reproducible example.
>
> My interpretation is that the failure in the first case is due to 2
> independent XML node types having the same name (<item> in this case) but
> having different types and occurring in different parents with different
> types. In the second case, both <item>'s actually have the same node type. I
> didn't use any Schema Inference Cache, so both item types should be inferred
> independently.
> Since the first document is legal XML (an XSD could be written for it) and
> can also be represented in Avro, its conversion shouldn't fail.
> I'll be happy to provide more details if needed.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)