[
https://issues.apache.org/jira/browse/NIFI-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15880557#comment-15880557
]
ASF GitHub Bot commented on NIFI-2876:
--------------------------------------
Github user markap14 commented on a diff in the pull request:
https://github.com/apache/nifi/pull/1214#discussion_r102718767
--- Diff:
nifi-commons/nifi-utils/src/main/java/org/apache/nifi/stream/io/util/TextLineDemarcator.java
---
@@ -95,52 +71,61 @@ public OffsetInfo nextOffsetInfo() {
*
* @return offset info
*/
- public OffsetInfo nextOffsetInfo(byte[] startsWith) {
+ public OffsetInfo nextOffsetInfo(byte[] startsWith) throws IOException
{
OffsetInfo offsetInfo = null;
- int lineLength = 0;
- byte[] token = null;
- lineLoop:
- while (this.bufferLength != -1) {
+ byte previousByteVal = 0;
+ byte[] data = null;
+ nextTokenLoop:
+ while (data == null && this.bufferLength != -1) {
if (this.index >= this.bufferLength) {
this.fill();
}
+ int delimiterSize = 0;
if (this.bufferLength != -1) {
- int i;
byte byteVal;
+ int i;
for (i = this.index; i < this.bufferLength; i++) {
byteVal = this.buffer[i];
- lineLength++;
- int crlfLength = computeEol(byteVal, i + 1);
- if (crlfLength > 0) {
- i += crlfLength;
- if (crlfLength == 2) {
- lineLength++;
- }
- offsetInfo = new OffsetInfo(this.offset,
lineLength, crlfLength);
+
+ if (byteVal == 10) {
--- End diff --
Can we get rid of inline constants 10 & 13 here and instead make them
static final member variables to make the code a little more readable?
> Refactor TextLineDemarcator and StreamDemarcator into a common abstract class
> -----------------------------------------------------------------------------
>
> Key: NIFI-2876
> URL: https://issues.apache.org/jira/browse/NIFI-2876
> Project: Apache NiFi
> Issue Type: Improvement
> Reporter: Oleg Zhurakousky
> Assignee: Oleg Zhurakousky
> Priority: Minor
> Fix For: 1.2.0
>
>
> Based on the work that has been performed as part of the NIFI-2851 we now
> have a new class with a significantly faster logic to perform demarcation of
> the InputStream (TextLineDemarcator). This new class's initial starting point
> was the existing LineDemarcator. They both now share ~60-70% of common code
> which would be important to extract into a common abstract class as well as
> incorporate the new (faster) demarcation logic int StreamDemarcator.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)