[
https://issues.apache.org/jira/browse/NIFI-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15189233#comment-15189233
]
Mark Bean commented on NIFI-1118:
---------------------------------
I agreet that every line should be treated the same. I was thinking it would be
easy to remove blank lines at the end of the file, but that goes back to the
problem of knowing you are reaching the end of the split ahead of time. It just
isn't possible without adding significant overhead - other than perhaps
considering only the very last line of the split. In fact, this is how things
are done now. My test case confirms that only the last blank line of multiple
ending blank lines in the split is removed.
Additionally, as mentioned previously, when Remove Trailing Newlines is true,
SplitText stops splitting the flowfile when an entire spilt consists of only
blank lines. Data could potentially be lost. Clearly, this is not desirable.
Due to the admittedly buggy nature of the Remove Trailing Newlines, the limited
benefit, and the complexity and statefulness required to 'properly' handle the
property, I recommend this property be removed. The overhead doesn't seem to
justify the limited use case. Yes, it may have an impact on existing users
(keeping in mind existing users are only removing the final newline of a
split.) However, we soon get into a discussion of which is better: a solid
processor, or one that continues to be buggy simply because it has always been
that way.
> Enable SplitText processor to limit line length and filter header lines
> -----------------------------------------------------------------------
>
> Key: NIFI-1118
> URL: https://issues.apache.org/jira/browse/NIFI-1118
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Extensions
> Reporter: Mark Bean
> Assignee: Joe Skora
> Fix For: 0.6.0
>
>
> Include the following functionality to the SplitText processor:
> 1) Maximum size limit of the split file(s)
> A new split file will be created if the next line to be added to the current
> split file exceeds a user-defined maximum file size
> 2) Header line marker
> User-defined character(s) can be used to identify the header line(s) of the
> data file rather than a predetermined number of lines
> These changes are additions, not a replacement of any property or behavior.
> In the case of header line marker, the existing property "Header Line Count"
> must be zero for the new property and behavior to be used.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)