[
https://issues.apache.org/jira/browse/NIFI-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15201708#comment-15201708
]
Joseph Witt commented on NIFI-1649:
-----------------------------------
Mark Bean added a comment - 09/Mar/16 14:56
What is the intent of the 'Remove Trailing Newlines' property? I believe the
intent is to remove the End Of Line (EOL) character from the last line of each
split file along with any additional lines that consist of nothing other than
the EOL character (i.e. blank lines.) It seems to work fine when there is data
other than blank lines. However, blank lines result in odd behavior. For
example, I have observed the second split file having only 2 (blank) lines in a
case where Header Line Count = 0, Line Split Count = 3, Remove Trailing
Newlines = true, and the input file has lines 4-9 consisting of only '\n'.
Essentially, only the last line of the split has its EOL removed.
Even more concerning is the case when Header Line Count is specified (and
therefore all lines are written to an output stream versus simply cloning
segments of the input flowfile.) Here, when a split file consists of nothing
but blank lines, not only is that split file not output, but no subsequent
split files are generated. The splitting is effectively stopped because
processing believes the empty split file is the result of End Of File. This is
a bug.
This can be addressed in the redesign of the SplitText processor. However,
"proper" behavior needs to be well-defined. Additionally, I strongly recommend
that the last line of the split file contain the exact contents as the line
from the original flowfile. In other words, keep the EOL character. Removing it
becomes highly problematic when splitting on maximum size. In such cases, you
never know you're on the last line of the split file until the next line is
read (and exceeds the limit.) Further, the behavior of a split file consisting
of only blank lines (when Remove Trailing Newlines is true) needs to be clearly
defined.
Suggestions: include EOL for all lines, but only remove trailing blank lines.
Further, in cases where Remove Trailing Newlines is true and a split consists
of only newlines, the split should consist of a single blank line.
Please comment.
Reply
> SplitText end of line handling is incorrect
> -------------------------------------------
>
> Key: NIFI-1649
> URL: https://issues.apache.org/jira/browse/NIFI-1649
> Project: Apache NiFi
> Issue Type: Bug
> Components: Extensions
> Reporter: Joseph Witt
> Assignee: Joseph Witt
> Priority: Critical
> Fix For: 0.6.0
>
>
> Lengthy discussion about this in NIFI-1118
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)