[jira] [Commented] (NIFI-1118) Enable SplitText processor to limit line length and filter header lines

Mark Bean (JIRA) Sat, 19 Mar 2016 04:04:01 -0700

    [ 
https://issues.apache.org/jira/browse/NIFI-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15201671#comment-15201671
 ]


Mark Bean commented on NIFI-1118:
---------------------------------

When considering the impact on users, consider how the users' data is currently 
being affected if they have chosen to set Remove Trailing Newlines to true:

- If an input FlowFile has a number of blank lines greater than the Line Split 
Count property, the remainder of the FlowFile will not be processed potentially 
resulting in silent data loss.
- If an input FlowFile has X blank lines at the end of a file and Header Line 
Count = 0, only one newline will be removed from each flowfile. In the case 
where X is greater than Line Split Count, there will be split files consisting 
of nothing but blank lines, specifically one fewer lines than Line Split Count 
(i.e. only the final newline character is removed)
- If an input FlowFile has X blank lines at the end of a file and Header Line 
Count = 1 (or any non-zero value), the blank lines are removed and no split 
file of just blanks is created. However, the final line does contain a newline 
character. In other split files, the final line has the newline character 
removed.
- If an input FlowFile has X blank lines at the end of a file and Line Split 
Count is greater than the number of lines in the file, no newlines are removed.

Coming up with a concise, accurate description of what this property does is 
nearly impossible, and I would have a difficult time providing such information 
to a user who had a question on its usage.

Bottom line: if not outright incorrect such as not processing the full 
FlowFile, the behavior of this feature is wildly inconsistent. Is it worth 
maintaining such a feature simply because it is already available for use? I'm 
not intending to be argumentative, but rather asking a philosophical question 
on how best to proceed with the greatest benefit to users.

> Enable SplitText processor to limit line length and filter header lines
> -----------------------------------------------------------------------
>
>                 Key: NIFI-1118
>                 URL: https://issues.apache.org/jira/browse/NIFI-1118
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Extensions
>            Reporter: Mark Bean
>            Assignee: Joe Skora
>             Fix For: 0.6.0
>
>
> Include the following functionality to the SplitText processor:
> 1) Maximum size limit of the split file(s)
> A new split file will be created if the next line to be added to the current 
> split file exceeds a user-defined maximum file size
> 2) Header line marker
> User-defined character(s) can be used to identify the header line(s) of the 
> data file rather than a predetermined number of lines
> These changes are additions, not a replacement of any property or behavior. 
> In the case of header line marker, the existing property "Header Line Count" 
> must be zero for the new property and behavior to be used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (NIFI-1118) Enable SplitText processor to limit line length and filter header lines

Reply via email to