[ 
https://issues.apache.org/jira/browse/NIFI-7790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pierre Villard resolved NIFI-7790.
----------------------------------
    Resolution: Feedback Received

Apache NiFi 1.x is no longer maintained and no new release is planned on the 
1.x release line. Marking as resolved as part of a cleanup operation. Please 
open a new one with an updated description if this is still relevant for NiFi 
2.x.

> XML record reader - failure on well-formed XML
> ----------------------------------------------
>
>                 Key: NIFI-7790
>                 URL: https://issues.apache.org/jira/browse/NIFI-7790
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Extensions
>    Affects Versions: 1.11.4
>            Reporter: Pierre Gramme
>            Priority: Major
>              Labels: records, xml
>         Attachments: bug-parse-xml.xml
>
>
> I am using ConvertRecord in order to parse XML flowfiles to Avro, with the 
> Infer Schema strategy. Some input flowfiles are sent to the failure output 
> queue whereas they are well-formed: 
> {code:java}
> <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
> <root>
>       <authors>
>               <item>
>                       <name>Neil Gaiman</name>
>               </item>
>       </authors>
>       <editors>
>               <item>
>                       <commercialName>Hachette</commercialName>
>               </item>
>       </editors>
> </root>
> {code}
> Note the use of authors/item/name on one side, and 
> editors/item/commercialName on the other side.
> On the other hand, this gets correctly parsed: 
> {code:java}
> <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
> <root>
>       <authors>
>               <item>
>                       <name>Neil Gaiman</name>
>               </item>
>       </authors>
>       <editors>
>               <item>
>                       <name>Hachette</name>
>               </item>
>       </editors>
> </root>
> {code}
> See the attached template for minimal reproducible example.
>  
> My interpretation is that the failure in the first case is due to 2 
> independent XML node types having the same name (<item> in this case) but 
> having different types and occurring in different parents with different 
> types. In the second case, both <item>'s actually have the same node type. I 
> didn't use any Schema Inference Cache, so both item types should be inferred 
> independently. 
> Since the first document is legal XML (an XSD could be written for it) and 
> can also be represented in Avro, its conversion shouldn't fail.
> I'll be happy to provide more details if needed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to