[
https://issues.apache.org/jira/browse/NIFI-7790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pierre Villard resolved NIFI-7790.
----------------------------------
Resolution: Feedback Received
Apache NiFi 1.x is no longer maintained and no new release is planned on the
1.x release line. Marking as resolved as part of a cleanup operation. Please
open a new one with an updated description if this is still relevant for NiFi
2.x.
> XML record reader - failure on well-formed XML
> ----------------------------------------------
>
> Key: NIFI-7790
> URL: https://issues.apache.org/jira/browse/NIFI-7790
> Project: Apache NiFi
> Issue Type: Bug
> Components: Extensions
> Affects Versions: 1.11.4
> Reporter: Pierre Gramme
> Priority: Major
> Labels: records, xml
> Attachments: bug-parse-xml.xml
>
>
> I am using ConvertRecord in order to parse XML flowfiles to Avro, with the
> Infer Schema strategy. Some input flowfiles are sent to the failure output
> queue whereas they are well-formed:
> {code:java}
> <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
> <root>
> <authors>
> <item>
> <name>Neil Gaiman</name>
> </item>
> </authors>
> <editors>
> <item>
> <commercialName>Hachette</commercialName>
> </item>
> </editors>
> </root>
> {code}
> Note the use of authors/item/name on one side, and
> editors/item/commercialName on the other side.
> On the other hand, this gets correctly parsed:
> {code:java}
> <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
> <root>
> <authors>
> <item>
> <name>Neil Gaiman</name>
> </item>
> </authors>
> <editors>
> <item>
> <name>Hachette</name>
> </item>
> </editors>
> </root>
> {code}
> See the attached template for minimal reproducible example.
>
> My interpretation is that the failure in the first case is due to 2
> independent XML node types having the same name (<item> in this case) but
> having different types and occurring in different parents with different
> types. In the second case, both <item>'s actually have the same node type. I
> didn't use any Schema Inference Cache, so both item types should be inferred
> independently.
> Since the first document is legal XML (an XSD could be written for it) and
> can also be represented in Avro, its conversion shouldn't fail.
> I'll be happy to provide more details if needed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)