[
https://issues.apache.org/jira/browse/NIFI-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15880553#comment-15880553
]
ASF GitHub Bot commented on NIFI-2876:
--------------------------------------
Github user markap14 commented on a diff in the pull request:
https://github.com/apache/nifi/pull/1214#discussion_r102718539
--- Diff:
nifi-commons/nifi-utils/src/main/java/org/apache/nifi/stream/io/util/StreamDemarcator.java
---
@@ -102,99 +83,53 @@ public StreamDemarcator(InputStream is, byte[]
delimiterBytes, int maxDataSize,
* @throws IOException if unable to read from the stream
*/
public byte[] nextToken() throws IOException {
- byte[] data = null;
+ byte[] token = null;
int j = 0;
-
- while (data == null && this.buffer != null) {
- if (this.index >= this.readAheadLength) {
+ nextTokenLoop:
+ while (token == null && this.bufferLength != -1) {
+ if (this.index >= this.bufferLength) {
this.fill();
}
- if (this.index >= this.readAheadLength) {
- data = this.extractDataToken(0);
- this.buffer = null;
- } else {
- byte byteVal = this.buffer[this.index++];
- if (this.delimiterBytes != null && this.delimiterBytes[j]
== byteVal) {
- if (++j == this.delimiterBytes.length) {
- data =
this.extractDataToken(this.delimiterBytes.length);
+ if (this.bufferLength != -1) {
+ byte byteVal;
+ int i;
+ for (i = this.index; i < this.bufferLength; i++) {
+ byteVal = this.buffer[i];
+
+ boolean delimiterFound = false;
+ if (this.delimiterBytes != null &&
this.delimiterBytes[j] == byteVal) {
--- End diff --
This seems to be buggy. If this.delimiterBytes[j] == byteVal, we increment
j. But the next byte does not match, we have already incremented j and it won't
get reset. As a result, if we find all bytes in the delimiter in the proper
order, we return that token, even if the bytes are not contiguous. Please add
the following unit test to the test case and you will see the failure:
```
@Test
public void testOnPartialMatchThenSubsequentPartialMatch() throws
IOException {
final byte[] inputData = "A Great Big
Boy".getBytes(StandardCharsets.UTF_8);
final byte[] delimBytes = "AB".getBytes(StandardCharsets.UTF_8);
try (final InputStream is = new ByteArrayInputStream(inputData);
final StreamDemarcator demarcator = new StreamDemarcator(is,
delimBytes, 4096)) {
final byte[] bytes = demarcator.nextToken();
assertArrayEquals(inputData, bytes);
assertNull(demarcator.nextToken());
}
}
```
> Refactor TextLineDemarcator and StreamDemarcator into a common abstract class
> -----------------------------------------------------------------------------
>
> Key: NIFI-2876
> URL: https://issues.apache.org/jira/browse/NIFI-2876
> Project: Apache NiFi
> Issue Type: Improvement
> Reporter: Oleg Zhurakousky
> Assignee: Oleg Zhurakousky
> Priority: Minor
> Fix For: 1.2.0
>
>
> Based on the work that has been performed as part of the NIFI-2851 we now
> have a new class with a significantly faster logic to perform demarcation of
> the InputStream (TextLineDemarcator). This new class's initial starting point
> was the existing LineDemarcator. They both now share ~60-70% of common code
> which would be important to extract into a common abstract class as well as
> incorporate the new (faster) demarcation logic int StreamDemarcator.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)