Github user markap14 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1214#discussion_r102718539
  
    --- Diff: 
nifi-commons/nifi-utils/src/main/java/org/apache/nifi/stream/io/util/StreamDemarcator.java
 ---
    @@ -102,99 +83,53 @@ public StreamDemarcator(InputStream is, byte[] 
delimiterBytes, int maxDataSize,
          * @throws IOException if unable to read from the stream
          */
         public byte[] nextToken() throws IOException {
    -        byte[] data = null;
    +        byte[] token = null;
             int j = 0;
    -
    -        while (data == null && this.buffer != null) {
    -            if (this.index >= this.readAheadLength) {
    +        nextTokenLoop:
    +        while (token == null && this.bufferLength != -1) {
    +            if (this.index >= this.bufferLength) {
                     this.fill();
                 }
    -            if (this.index >= this.readAheadLength) {
    -                data = this.extractDataToken(0);
    -                this.buffer = null;
    -            } else {
    -                byte byteVal = this.buffer[this.index++];
    -                if (this.delimiterBytes != null && this.delimiterBytes[j] 
== byteVal) {
    -                    if (++j == this.delimiterBytes.length) {
    -                        data = 
this.extractDataToken(this.delimiterBytes.length);
    +            if (this.bufferLength != -1) {
    +                byte byteVal;
    +                int i;
    +                for (i = this.index; i < this.bufferLength; i++) {
    +                    byteVal = this.buffer[i];
    +
    +                    boolean delimiterFound = false;
    +                    if (this.delimiterBytes != null && 
this.delimiterBytes[j] == byteVal) {
    --- End diff --
    
    This seems to be buggy. If this.delimiterBytes[j] == byteVal, we increment 
j. But the next byte does not match, we have already incremented j and it won't 
get reset. As a result, if we find all bytes in the delimiter in the proper 
order, we return that token, even if the bytes are not contiguous. Please add 
the following unit test to the test case and you will see the failure:
    
    ```
        @Test
        public void testOnPartialMatchThenSubsequentPartialMatch() throws 
IOException {
            final byte[] inputData = "A Great Big 
Boy".getBytes(StandardCharsets.UTF_8);
            final byte[] delimBytes = "AB".getBytes(StandardCharsets.UTF_8);
    
            try (final InputStream is = new ByteArrayInputStream(inputData);
                final StreamDemarcator demarcator = new StreamDemarcator(is, 
delimBytes, 4096)) {
    
                final byte[] bytes = demarcator.nextToken();
                assertArrayEquals(inputData, bytes);
    
                assertNull(demarcator.nextToken());
            }
        }
    
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to