[
https://issues.apache.org/jira/browse/NIFI-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15880935#comment-15880935
]
ASF GitHub Bot commented on NIFI-2876:
--------------------------------------
Github user markap14 commented on a diff in the pull request:
https://github.com/apache/nifi/pull/1214#discussion_r102774788
--- Diff:
nifi-commons/nifi-utils/src/main/java/org/apache/nifi/stream/io/util/AbstractDemarcator.java
---
@@ -98,22 +136,26 @@ void fill() throws IOException {
}
int bytesRead;
+ /*
+ * The do/while pattern is used here similar to the way it is used
in
+ * BufferedReader essentially protecting from assuming the EOS
until it
+ * actually is since not every implementation of InputStream
guarantees
+ * that bytes are always available while the stream is open.
+ */
do {
bytesRead = this.is.read(this.buffer, this.index,
this.buffer.length - this.index);
} while (bytesRead == 0);
- this.bufferLength = bytesRead != -1 ? this.index + bytesRead : -1;
- if (this.bufferLength > this.maxDataSize) {
+ this.availableBytesLength = bytesRead != -1 ? this.index +
bytesRead : -1;
+ if (this.availableBytesLength > this.maxDataSize) {
--- End diff --
This logic seems wrong to me. It is saying 'If we have more data available
in the buffer than the largest token that is allowed, we should fail?? The
following unit test fails, though I believe it should pass:
```
@Test
public void testLargeBufferSmallMaxSize() throws IOException {
final byte[] inputData = "A Great Benefit To Us
All".getBytes(StandardCharsets.UTF_8);
try (final InputStream is = new ByteArrayInputStream(inputData);
final StreamDemarcator demarcator = new StreamDemarcator(is,
"B".getBytes(StandardCharsets.UTF_8), 24, 4096)) {
final byte[] first = demarcator.nextToken();
assertNotNull(first);
assertEquals("A Great ", new String(first));
final byte[] second = demarcator.nextToken();
assertNotNull(second);
assertEquals("enefit To Us All", new String(second));
assertNull(demarcator.nextToken());
}
}
```
> Refactor TextLineDemarcator and StreamDemarcator into a common abstract class
> -----------------------------------------------------------------------------
>
> Key: NIFI-2876
> URL: https://issues.apache.org/jira/browse/NIFI-2876
> Project: Apache NiFi
> Issue Type: Improvement
> Reporter: Oleg Zhurakousky
> Assignee: Oleg Zhurakousky
> Priority: Minor
> Fix For: 1.2.0
>
>
> Based on the work that has been performed as part of the NIFI-2851 we now
> have a new class with a significantly faster logic to perform demarcation of
> the InputStream (TextLineDemarcator). This new class's initial starting point
> was the existing LineDemarcator. They both now share ~60-70% of common code
> which would be important to extract into a common abstract class as well as
> incorporate the new (faster) demarcation logic int StreamDemarcator.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)