Andrew Chafos created NIFI-7697:
-----------------------------------

             Summary: NiFi XMLReader Record Component sometimes ignores empty 
XML Elements
                 Key: NIFI-7697
                 URL: https://issues.apache.org/jira/browse/NIFI-7697
             Project: Apache NiFi
          Issue Type: Bug
          Components: Core Framework
    Affects Versions: 1.11.4
         Environment: Windows 10
            Reporter: Andrew Chafos


I am currently developing a processor for Apache NiFi that is contingent upon 
being configured with an implementation of RecordReaderFactory that produces 
well-formed NiFi Records based on input data.

The JsonTreeReader component produced accurate results for all of my test 
cases.  However, I noticed that, at least with the default configuration, the 
XMLReader component sometimes seems to mishandle data; namely, empty XML 
elements that are sub-children of XML elements that are represented as Arrays 
in NiFi Records.

This occurs when I test using the standard ConvertRecord NiFi Processor and set 
the Reader to XMLReader and the Writer to JsonRecordSetWriter.

These first 2 test cases work as expected:

*Test Case 1:*

Input XML:
{code:xml}
<?xml version="1.0" encoding="UTF-8"?>
<Root>
   <DataArr>SomeData</DataArr>
   <DataArr>
      <Field>
         <NonEmptyField>2</NonEmptyField>
      </Field>
   </DataArr>
</Root>
{code}
Output Json:
{code:json}
[
   {
      "DataArr":[
         "SomeData",
         "MapRecord[{Field=MapRecord[{NonEmptyField=2}]}]"
      ]
   }
]
{code}
*Test Case 2:*

Input XML:
{code:xml}
<?xml version="1.0" encoding="UTF-8"?>
<Root>
   <SomeData />
   <MoreData>2</MoreData>
</Root>
{code}

Output Json:
{code:json}
[
   {
      "SomeData":null,
      "MoreData":2
   }
]
{code}

However, the following does *not* work as expected:

*Test Case 3:*

Input XML:
{code:xml}
<Root>
   <DataArr>SomeData</DataArr>
   <DataArr>
      <Field>
         <EmptyField/>
      </Field>
   </DataArr>
</Root>
{code}

Output Json:
{code:json}
[
   {
      "DataArr":[
         "SomeData"
      ]
   }
]
{code}

It is critical for the functioning of my Processor that Field and EmptyField 
appear in this Json output for Test Case 3, and for all other inputs analogous 
to this case.

I have tried to supply a custom NiFi RecordSchema to the components and 
verified it was being used, but I got the same results.

Is there a way to configure these controllers such that this empty field is not 
ignored, or is this a bug in the XMLReader component?

You can get these results from running this processor as described on NiFi, but 
you can also run this JUnit test with testXml swapped out with the particular 
test case:

{code:java}
import org.apache.nifi.controller.ControllerService;
import org.apache.nifi.json.JsonRecordSetWriter;
import org.apache.nifi.processor.Relationship;
import org.apache.nifi.processors.standard.ConvertRecord;
import org.apache.nifi.reporting.InitializationException;
import org.apache.nifi.util.MockFlowFile;
import org.apache.nifi.util.TestRunner;
import org.apache.nifi.util.TestRunners;
import org.apache.nifi.xml.XMLReader;
import org.junit.Test;

public class TestNiFiMinimal {
    @Test
    public void testEmptyXMLGetsProcessed() throws InitializationException {
        ConvertRecord convertRecord = new ConvertRecord();
        TestRunner testRunner = TestRunners.newTestRunner(convertRecord);
        ControllerService xmlReader = new XMLReader();
        testRunner.addControllerService("xmlReader", xmlReader);
        testRunner.enableControllerService(xmlReader);
        testRunner.setProperty("record-reader", "xmlReader");
        ControllerService jsonWriter = new JsonRecordSetWriter();
        testRunner.addControllerService("jsonWriter", jsonWriter);
        testRunner.enableControllerService(jsonWriter);
        testRunner.setProperty("record-writer", "jsonWriter");
        String testXml = "<?xml version='1.0' 
encoding='UTF-8'?><Root><DataArr>SomeData</DataArr><DataArr><Field><EmptyField/></Field></DataArr></Root>";
        testRunner.enqueue(testXml);
        testRunner.run();
        Relationship success = 
convertRecord.getRelationships().stream().filter(relationship -> 
relationship.getName().equals("success")).findAny().get();
        testRunner.assertAllFlowFilesTransferred(success);
        final MockFlowFile original = 
testRunner.getFlowFilesForRelationship(success).get(0);
        original.assertContentEquals("");
    }
}
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to