Gábor Gyimesi created NIFI-12238:
------------------------------------

             Summary: SplitText trims text ending character when max fragment 
size is specified and multiple endlines are present
                 Key: NIFI-12238
                 URL: https://issues.apache.org/jira/browse/NIFI-12238
             Project: Apache NiFi
          Issue Type: Bug
            Reporter: Gábor Gyimesi
            Assignee: Gábor Gyimesi


There seems to be an issue with SplitText processor when max fragment size is 
specified for fragment limit and the ending of the fragment contains multiple 
endline characters to be trimmed. In this case not only the endline characters, 
but the last character of the text is also trimmed.

Test case:
{code:java}
@Test
public void testMaxFragmentSizeWithTrimmedEndlines() {
    final TestRunner splitRunner = TestRunners.newTestRunner(new SplitText());
    splitRunner.setProperty(SplitText.HEADER_LINE_COUNT, "2");
    splitRunner.setProperty(SplitText.LINE_SPLIT_COUNT, "0");
    splitRunner.setProperty(SplitText.FRAGMENT_MAX_SIZE, "30 B");
    splitRunner.setProperty(SplitText.REMOVE_TRAILING_NEWLINES, "true");

    splitRunner.enqueue("header1\nheader2\nline1 longer than 
limit\nline2\nline3\n\n\n\n\n");

    splitRunner.run();
    splitRunner.assertTransferCount(SplitText.REL_SPLITS, 3);
    splitRunner.assertTransferCount(SplitText.REL_ORIGINAL, 1);
    splitRunner.assertTransferCount(SplitText.REL_FAILURE, 0);

    final List<MockFlowFile> splits = 
splitRunner.getFlowFilesForRelationship(SplitText.REL_SPLITS);
    splits.get(0).assertContentEquals("header1\nheader2\nline1 longer than 
limit");
    splits.get(1).assertContentEquals("header1\nheader2\nline2\nline3");
    splits.get(2).assertContentEquals("header1\nheader2");
}
{code}

Result:

{code:java}
expected: <header1
header2
line2
line3> but was: <header1
header2
line2
line>
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to