[
https://issues.apache.org/jira/browse/NIFI-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15201671#comment-15201671
]
Mark Bean commented on NIFI-1118:
---------------------------------
When considering the impact on users, consider how the users' data is currently
being affected if they have chosen to set Remove Trailing Newlines to true:
- If an input FlowFile has a number of blank lines greater than the Line Split
Count property, the remainder of the FlowFile will not be processed potentially
resulting in silent data loss.
- If an input FlowFile has X blank lines at the end of a file and Header Line
Count = 0, only one newline will be removed from each flowfile. In the case
where X is greater than Line Split Count, there will be split files consisting
of nothing but blank lines, specifically one fewer lines than Line Split Count
(i.e. only the final newline character is removed)
- If an input FlowFile has X blank lines at the end of a file and Header Line
Count = 1 (or any non-zero value), the blank lines are removed and no split
file of just blanks is created. However, the final line does contain a newline
character. In other split files, the final line has the newline character
removed.
- If an input FlowFile has X blank lines at the end of a file and Line Split
Count is greater than the number of lines in the file, no newlines are removed.
Coming up with a concise, accurate description of what this property does is
nearly impossible, and I would have a difficult time providing such information
to a user who had a question on its usage.
Bottom line: if not outright incorrect such as not processing the full
FlowFile, the behavior of this feature is wildly inconsistent. Is it worth
maintaining such a feature simply because it is already available for use? I'm
not intending to be argumentative, but rather asking a philosophical question
on how best to proceed with the greatest benefit to users.
> Enable SplitText processor to limit line length and filter header lines
> -----------------------------------------------------------------------
>
> Key: NIFI-1118
> URL: https://issues.apache.org/jira/browse/NIFI-1118
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Extensions
> Reporter: Mark Bean
> Assignee: Joe Skora
> Fix For: 0.6.0
>
>
> Include the following functionality to the SplitText processor:
> 1) Maximum size limit of the split file(s)
> A new split file will be created if the next line to be added to the current
> split file exceeds a user-defined maximum file size
> 2) Header line marker
> User-defined character(s) can be used to identify the header line(s) of the
> data file rather than a predetermined number of lines
> These changes are additions, not a replacement of any property or behavior.
> In the case of header line marker, the existing property "Header Line Count"
> must be zero for the new property and behavior to be used.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)