[jira] [Commented] (NIFI-1118) Enable SplitText processor to limit line length and filter header lines

ASF GitHub Bot (JIRA) Wed, 01 Jun 2016 17:52:12 -0700

    [ 
https://issues.apache.org/jira/browse/NIFI-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311514#comment-15311514
 ]


ASF GitHub Bot commented on NIFI-1118:
--------------------------------------

Github user markobean commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/444#discussion_r65468881
  
    --- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/SplitText.java
 ---
    @@ -147,138 +193,200 @@ protected void init(final 
ProcessorInitializationContext context) {
             return properties;
         }
     
    -    private int readLines(final InputStream in, final int maxNumLines, 
final OutputStream out, final boolean keepAllNewLines, final byte[] 
leadingNewLineBytes) throws IOException {
    +    private int readLines(final InputStream in, final int maxNumLines, 
final long maxByteCount, final OutputStream out,
    +                          final boolean includeLineDelimiter, final byte[] 
leadingNewLineBytes) throws IOException {
             final EndOfLineBuffer eolBuffer = new EndOfLineBuffer();
     
    -        int numLines = 0;
             byte[] leadingBytes = leadingNewLineBytes;
    +        int numLines = 0;
    +        long totalBytes = 0L;
             for (int i = 0; i < maxNumLines; i++) {
    -            final EndOfLineMarker eolMarker = locateEndOfLine(in, out, 
false, eolBuffer, leadingBytes);
    +            final EndOfLineMarker eolMarker = countBytesToSplitPoint(in, 
out, totalBytes, maxByteCount, includeLineDelimiter, eolBuffer, leadingBytes);
    +            final long bytes = eolMarker.getBytesConsumed();
                 leadingBytes = eolMarker.getLeadingNewLineBytes();
     
    -            if (keepAllNewLines && out != null) {
    +            if (includeLineDelimiter && out != null) {
                     if (leadingBytes != null) {
                         out.write(leadingBytes);
                         leadingBytes = null;
                     }
    -
                     eolBuffer.drainTo(out);
                 }
    -
    -            if (eolBuffer.length() > 0 || eolMarker.getBytesConsumed() > 
0L) {
    -                numLines++;
    +            totalBytes += bytes;
    +            if (bytes <= 0) {
    +                return numLines;
                 }
    -
    -            if (eolMarker.isStreamEnded()) {
    +            numLines++;
    +            if (totalBytes >= maxByteCount) {
                     break;
                 }
             }
    -
             return numLines;
         }
     
    -    private EndOfLineMarker locateEndOfLine(final InputStream in, final 
OutputStream out, final boolean includeLineDelimiter,
    -        final EndOfLineBuffer eolBuffer, final byte[] leadingNewLineBytes) 
throws IOException {
    -
    +    private EndOfLineMarker countBytesToSplitPoint(final InputStream in, 
final OutputStream out, final long bytesReadSoFar, final long maxSize,
    +                                                   final boolean 
includeLineDelimiter, final EndOfLineBuffer eolBuffer, final byte[] 
leadingNewLineBytes) throws IOException {
             int lastByte = -1;
             long bytesRead = 0L;
    +        final ByteArrayOutputStream buffer;
    +        if (out != null) {
    +            buffer = new ByteArrayOutputStream();
    +        } else {
    +            buffer = null;
    +        }
    --- End diff --
    
    It made more sense when this method had the possibility of having 'out' be 
null; this occurred by design in a previous version. Now, it is only called 
with a valid OutputStream. Still, I think this doesn't hurt and adds a layer of 
safety should the method be called differently in the future. However, if you 
still object,  the if/else can be removed and the final buffer be instantiated 
outright; all unit tests confirmed to pass in this configuration (although not 
staged to be committed at this time.)


> Enable SplitText processor to limit line length and filter header lines
> -----------------------------------------------------------------------
>
>                 Key: NIFI-1118
>                 URL: https://issues.apache.org/jira/browse/NIFI-1118
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Extensions
>            Reporter: Mark Bean
>            Assignee: Mark Bean
>             Fix For: 0.7.0
>
>
> Include the following functionality to the SplitText processor:
> 1) Maximum size limit of the split file(s)
> A new split file will be created if the next line to be added to the current 
> split file exceeds a user-defined maximum file size
> 2) Header line marker
> User-defined character(s) can be used to identify the header line(s) of the 
> data file rather than a predetermined number of lines
> These changes are additions, not a replacement of any property or behavior. 
> In the case of header line marker, the existing property "Header Line Count" 
> must be zero for the new property and behavior to be used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (NIFI-1118) Enable SplitText processor to limit line length and filter header lines

Reply via email to