[
https://issues.apache.org/jira/browse/NIFI-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15187800#comment-15187800
]
Mark Bean commented on NIFI-1118:
---------------------------------
What is the intent of the 'Remove Trailing Newlines' property? I believe the
intent is to remove the End Of Line (EOL) character from the last line of each
split file along with any additional lines that consist of nothing other than
the EOL character (i.e. blank lines.) It seems to work fine when there is data
other than blank lines. However, blank lines result in odd behavior. For
example, I have observed the second split file having only 2 (blank) lines in a
case where Header Line Count = 0, Line Split Count = 3, Remove Trailing
Newlines = true, and the input file has lines 4-9 consisting of only '\n'.
Essentially, only the last line of the split has its EOL removed.
Even more concerning is the case when Header Line Count is specified (and
therefore all lines are written to an output stream versus simply cloning
segments of the input flowfile.) Here, when a split file consists of nothing
but blank lines, not only is that split file not output, but no subsequent
split files are generated. The splitting is effectively stopped because
processing believes the empty split file is the result of End Of File. This is
a bug.
This can be addressed in the redesign of the SplitText processor. However,
"proper" behavior needs to be well-defined. Additionally, I strongly recommend
that the last line of the split file contain the exact contents as the line
from the original flowfile. In other words, keep the EOL character. Removing it
becomes highly problematic when splitting on maximum size. In such cases, you
never know you're on the last line of the split file until the next line is
read (and exceeds the limit.) Further, the behavior of a split file consisting
of only blank lines (when Remove Trailing Newlines is true) needs to be clearly
defined.
Suggestions: include EOL for all lines, but only remove trailing blank lines.
Further, in cases where Remove Trailing Newlines is true and a split consists
of only newlines, the split should consist of a single blank line.
Please comment.
> Enable SplitText processor to limit line length and filter header lines
> -----------------------------------------------------------------------
>
> Key: NIFI-1118
> URL: https://issues.apache.org/jira/browse/NIFI-1118
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Extensions
> Reporter: Mark Bean
> Assignee: Joe Skora
> Fix For: 0.6.0
>
>
> Include the following functionality to the SplitText processor:
> 1) Maximum size limit of the split file(s)
> A new split file will be created if the next line to be added to the current
> split file exceeds a user-defined maximum file size
> 2) Header line marker
> User-defined character(s) can be used to identify the header line(s) of the
> data file rather than a predetermined number of lines
> These changes are additions, not a replacement of any property or behavior.
> In the case of header line marker, the existing property "Header Line Count"
> must be zero for the new property and behavior to be used.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)