Pierre Gramme created NIFI-7790:
-----------------------------------

             Summary: XML record reader - failure on well-formed XML
                 Key: NIFI-7790
                 URL: https://issues.apache.org/jira/browse/NIFI-7790
             Project: Apache NiFi
          Issue Type: Bug
          Components: Extensions
    Affects Versions: 1.11.4
            Reporter: Pierre Gramme
         Attachments: bug-parse-xml.xml

I am using ConvertRecord in order to parse XML flowfiles to Avro, with the 
Infer Schema strategy. Some input flowfiles are sent to the failure output 
queue whereas they are well-formed: 
{code:java}
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<root>
        <authors>
                <item>
                        <name>Neil Gaiman</name>
                </item>
        </authors>
        <editors>
                <item>
                        <commercialName>Hachette</commercialName>
                </item>
        </editors>
</root>
{code}
Note the use of authors/item/name on one side, and editors/item/commercialName 
on the other side.

On the other hand, this gets correctly parsed: 
{code:java}
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<root>
        <authors>
                <item>
                        <name>Neil Gaiman</name>
                </item>
        </authors>
        <editors>
                <item>
                        <name>Hachette</name>
                </item>
        </editors>
</root>
{code}
See the attached template for minimal reproducible example.

 

My interpretation is that the failure in the first case is due to 2 independent 
XML node types having the same name (<item> in this case) but having different 
types and occurring in different parents with different types. In the second 
case, both <item>'s actually have the same node type. I didn't use any Schema 
Inference Cache, so both item types should be inferred independently. 

Since the first document is legal XML (an XSD could be written for it) and can 
also be represented in Avro, its conversion shouldn't fail.

I'll be happy to provide more details if needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to