[jira] [Commented] (NIFI-2876) Refactor TextLineDemarcator and StreamDemarcator into a common abstract class

ASF GitHub Bot (JIRA) Thu, 23 Feb 2017 09:54:00 -0800

    [ 
https://issues.apache.org/jira/browse/NIFI-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15880935#comment-15880935
 ]


ASF GitHub Bot commented on NIFI-2876:
--------------------------------------

Github user markap14 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1214#discussion_r102774788
  
    --- Diff: 
nifi-commons/nifi-utils/src/main/java/org/apache/nifi/stream/io/util/AbstractDemarcator.java
 ---
    @@ -98,22 +136,26 @@ void fill() throws IOException {
             }
     
             int bytesRead;
    +        /*
    +         * The do/while pattern is used here similar to the way it is used 
in
    +         * BufferedReader essentially protecting from assuming the EOS 
until it
    +         * actually is since not every implementation of InputStream 
guarantees
    +         * that bytes are always available while the stream is open.
    +         */
             do {
                 bytesRead = this.is.read(this.buffer, this.index, 
this.buffer.length - this.index);
             } while (bytesRead == 0);
    -        this.bufferLength = bytesRead != -1 ? this.index + bytesRead : -1;
    -        if (this.bufferLength > this.maxDataSize) {
    +        this.availableBytesLength = bytesRead != -1 ? this.index + 
bytesRead : -1;
    +        if (this.availableBytesLength > this.maxDataSize) {
    --- End diff --
    
    This logic seems wrong to me. It is saying 'If we have more data available 
in the buffer than the largest token that is allowed, we should fail?? The 
following unit test fails, though I believe it should pass:
    
    ```
    @Test
        public void testLargeBufferSmallMaxSize() throws IOException {
            final byte[] inputData = "A Great Benefit To Us 
All".getBytes(StandardCharsets.UTF_8);
    
            try (final InputStream is = new ByteArrayInputStream(inputData);
                final StreamDemarcator demarcator = new StreamDemarcator(is, 
"B".getBytes(StandardCharsets.UTF_8), 24, 4096)) {
    
                final byte[] first = demarcator.nextToken();
                assertNotNull(first);
                assertEquals("A Great ", new String(first));
    
                final byte[] second = demarcator.nextToken();
                assertNotNull(second);
                assertEquals("enefit To Us All", new String(second));
    
                assertNull(demarcator.nextToken());
    
            }
        }
    ```


> Refactor TextLineDemarcator and StreamDemarcator into a common abstract class
> -----------------------------------------------------------------------------
>
>                 Key: NIFI-2876
>                 URL: https://issues.apache.org/jira/browse/NIFI-2876
>             Project: Apache NiFi
>          Issue Type: Improvement
>            Reporter: Oleg Zhurakousky
>            Assignee: Oleg Zhurakousky
>            Priority: Minor
>             Fix For: 1.2.0
>
>
> Based on the work that has been performed as part of the NIFI-2851 we now 
> have a new class with a significantly faster logic to perform demarcation of 
> the InputStream (TextLineDemarcator). This new class's initial starting point 
> was the existing LineDemarcator. They both now share ~60-70% of common code 
> which would be important to extract into a common abstract class as well as 
> incorporate the new (faster) demarcation logic int StreamDemarcator.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (NIFI-2876) Refactor TextLineDemarcator and StreamDemarcator into a common abstract class

Reply via email to