Andrew Chafos created NIFI-7697:
-----------------------------------
Summary: NiFi XMLReader Record Component sometimes ignores empty
XML Elements
Key: NIFI-7697
URL: https://issues.apache.org/jira/browse/NIFI-7697
Project: Apache NiFi
Issue Type: Bug
Components: Core Framework
Affects Versions: 1.11.4
Environment: Windows 10
Reporter: Andrew Chafos
I am currently developing a processor for Apache NiFi that is contingent upon
being configured with an implementation of RecordReaderFactory that produces
well-formed NiFi Records based on input data.
The JsonTreeReader component produced accurate results for all of my test
cases. However, I noticed that, at least with the default configuration, the
XMLReader component sometimes seems to mishandle data; namely, empty XML
elements that are sub-children of XML elements that are represented as Arrays
in NiFi Records.
This occurs when I test using the standard ConvertRecord NiFi Processor and set
the Reader to XMLReader and the Writer to JsonRecordSetWriter.
These first 2 test cases work as expected:
*Test Case 1:*
Input XML:
{code:xml}
<?xml version="1.0" encoding="UTF-8"?>
<Root>
<DataArr>SomeData</DataArr>
<DataArr>
<Field>
<NonEmptyField>2</NonEmptyField>
</Field>
</DataArr>
</Root>
{code}
Output Json:
{code:json}
[
{
"DataArr":[
"SomeData",
"MapRecord[{Field=MapRecord[{NonEmptyField=2}]}]"
]
}
]
{code}
*Test Case 2:*
Input XML:
{code:xml}
<?xml version="1.0" encoding="UTF-8"?>
<Root>
<SomeData />
<MoreData>2</MoreData>
</Root>
{code}
Output Json:
{code:json}
[
{
"SomeData":null,
"MoreData":2
}
]
{code}
However, the following does *not* work as expected:
*Test Case 3:*
Input XML:
{code:xml}
<Root>
<DataArr>SomeData</DataArr>
<DataArr>
<Field>
<EmptyField/>
</Field>
</DataArr>
</Root>
{code}
Output Json:
{code:json}
[
{
"DataArr":[
"SomeData"
]
}
]
{code}
It is critical for the functioning of my Processor that Field and EmptyField
appear in this Json output for Test Case 3, and for all other inputs analogous
to this case.
I have tried to supply a custom NiFi RecordSchema to the components and
verified it was being used, but I got the same results.
Is there a way to configure these controllers such that this empty field is not
ignored, or is this a bug in the XMLReader component?
You can get these results from running this processor as described on NiFi, but
you can also run this JUnit test with testXml swapped out with the particular
test case:
{code:java}
import org.apache.nifi.controller.ControllerService;
import org.apache.nifi.json.JsonRecordSetWriter;
import org.apache.nifi.processor.Relationship;
import org.apache.nifi.processors.standard.ConvertRecord;
import org.apache.nifi.reporting.InitializationException;
import org.apache.nifi.util.MockFlowFile;
import org.apache.nifi.util.TestRunner;
import org.apache.nifi.util.TestRunners;
import org.apache.nifi.xml.XMLReader;
import org.junit.Test;
public class TestNiFiMinimal {
@Test
public void testEmptyXMLGetsProcessed() throws InitializationException {
ConvertRecord convertRecord = new ConvertRecord();
TestRunner testRunner = TestRunners.newTestRunner(convertRecord);
ControllerService xmlReader = new XMLReader();
testRunner.addControllerService("xmlReader", xmlReader);
testRunner.enableControllerService(xmlReader);
testRunner.setProperty("record-reader", "xmlReader");
ControllerService jsonWriter = new JsonRecordSetWriter();
testRunner.addControllerService("jsonWriter", jsonWriter);
testRunner.enableControllerService(jsonWriter);
testRunner.setProperty("record-writer", "jsonWriter");
String testXml = "<?xml version='1.0'
encoding='UTF-8'?><Root><DataArr>SomeData</DataArr><DataArr><Field><EmptyField/></Field></DataArr></Root>";
testRunner.enqueue(testXml);
testRunner.run();
Relationship success =
convertRecord.getRelationships().stream().filter(relationship ->
relationship.getName().equals("success")).findAny().get();
testRunner.assertAllFlowFilesTransferred(success);
final MockFlowFile original =
testRunner.getFlowFilesForRelationship(success).get(0);
original.assertContentEquals("");
}
}
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)