[ 
https://issues.apache.org/jira/browse/NIFI-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15109592#comment-15109592
 ] 

ASF GitHub Bot commented on NIFI-1118:
--------------------------------------

Github user markap14 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/135#discussion_r50328396
  
    --- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/SplitText.java
 ---
    @@ -198,23 +208,53 @@ private long countBytesToSplitPoint(final InputStream 
in, final OutputStream out
                     return includeLineDelimiter ? bytesRead : bytesRead - 1;
                 }
     
    -            // keep track of what the last byte was that we read so that 
we can detect \r followed by some other
    +            // keep track of what the last byte was that we read so that 
we can
    +            // detect \r followed by some other
                 // character.
                 lastByte = nextByte;
             }
         }
     
    -    private SplitInfo countBytesToSplitPoint(final InputStream in, final 
int numLines, final boolean keepAllNewLines) throws IOException {
    +    private SplitInfo readHeader(final int numHeaderLines,
    +                                 final String headerMarker, final 
InputStream in,
    +                                 final OutputStream out, final boolean 
keepAllNewLines)
    +                                throws IOException {
             SplitInfo info = new SplitInfo();
    -
    -        while (info.lengthLines < numLines) {
    -            final long bytesTillNext = countBytesToSplitPoint(in, null, 
keepAllNewLines || (info.lengthLines != numLines - 1));
    -            if (bytesTillNext <= 0L) {
    -                break;
    +        boolean isHeaderLine = true;
    +
    +        // Read numHeaderLines from file, if specificed; a non-zero value 
takes precedence
    +        // over headerMarker character string
    +        if (numHeaderLines > 0) {
    +            for (int i = 0; i < numHeaderLines; i++) {
    +                int bytesRead = readLine(in, out, keepAllNewLines);
    +                if (bytesRead == 0) {
    +                    break;
    +                }
    +                info.lengthBytes += bytesRead;
    +                info.lengthLines++;
    +            }
    +        // Else, keep reading all lines that begin with headerMarker 
character string
    +        } else if (headerMarker != null) {
    +            while (true) {
    +                in.mark(0);
    --- End diff --
    
    This call to in.mark(0) means that when in.reset() is called, we will reset 
back 0 bytes, essentially making the mark/reset do nothing. Not sure that I am 
following the logic here with how the mark/reset are being used.


> Enable SplitText processor to limit line length and filter header lines
> -----------------------------------------------------------------------
>
>                 Key: NIFI-1118
>                 URL: https://issues.apache.org/jira/browse/NIFI-1118
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Extensions
>            Reporter: Mark Bean
>            Assignee: Joe Skora
>             Fix For: 0.5.0
>
>
> Include the following functionality to the SplitText processor:
> 1) Maximum size limit of the split file(s)
> A new split file will be created if the next line to be added to the current 
> split file exceeds a user-defined maximum file size
> 2) Header line marker
> User-defined character(s) can be used to identify the header line(s) of the 
> data file rather than a predetermined number of lines
> These changes are additions, not a replacement of any property or behavior. 
> In the case of header line marker, the existing property "Header Line Count" 
> must be zero for the new property and behavior to be used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to