[
https://issues.apache.org/jira/browse/NIFI-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Oleg Zhurakousky updated NIFI-3495:
-----------------------------------
Description:
This condition is very rare. It only occurs when read ahead (call to _fill()_)
is made inside of the _isEol_ operation which essentially sets the new index
which then is reset inside of the main _nextOffsetInfo_ operation.
So the fix is to basically monitor if _isEol_ had to perform read ahead and if
it did do not reset the index.
More details.
While this component is modeled after standard Java BufferedReader which simply
reads and returns lines (delimited by CR or LF or both), this reader also holds
the information about how each line terminated (i.e., EOF, or CR or LF or CR
and LF) returning it to the caller as OffsetInfo.
So for example if you have a record "foo\r\nbar" and you read it with
BuffereReader you will get 'foo' and 'bar'. However you will not know that
between the two tokens there was CR and LF and therefore will not be able to
restore (if need to) the record to its original state. The TextLineDemarcator
will return OffsetInfo which holds the delimiter and other information.
So, to accomplish the above every time we see CR (13) we need to peek at the
next byte and see if its LF(10). When at the end of the buffer such peek
becomes complicated since we need to read more data and so we did, but didn't
handle index properly essentially setting it back to the old value when the new
one was set inside of the fill().
was:
This condition is very rare. It only occurs when read ahead (call to _fill()_)
is made inside of the _isEol_ operation which essentially sets the new index
which then is reset inside of the main _nextOffsetInfo_ operation.
So the fix is to basically monitor if _isEol_ had to perform read ahead and if
it did do not reset the index.
> TextLineDemarcator sets the wrong index when read ahead is performed in isEol
> operation
> ---------------------------------------------------------------------------------------
>
> Key: NIFI-3495
> URL: https://issues.apache.org/jira/browse/NIFI-3495
> Project: Apache NiFi
> Issue Type: Bug
> Reporter: Oleg Zhurakousky
> Assignee: Oleg Zhurakousky
> Priority: Critical
> Fix For: 1.2.0
>
>
> This condition is very rare. It only occurs when read ahead (call to
> _fill()_) is made inside of the _isEol_ operation which essentially sets the
> new index which then is reset inside of the main _nextOffsetInfo_ operation.
> So the fix is to basically monitor if _isEol_ had to perform read ahead and
> if it did do not reset the index.
> More details.
> While this component is modeled after standard Java BufferedReader which
> simply reads and returns lines (delimited by CR or LF or both), this reader
> also holds the information about how each line terminated (i.e., EOF, or CR
> or LF or CR and LF) returning it to the caller as OffsetInfo.
> So for example if you have a record "foo\r\nbar" and you read it with
> BuffereReader you will get 'foo' and 'bar'. However you will not know that
> between the two tokens there was CR and LF and therefore will not be able to
> restore (if need to) the record to its original state. The TextLineDemarcator
> will return OffsetInfo which holds the delimiter and other information.
> So, to accomplish the above every time we see CR (13) we need to peek at the
> next byte and see if its LF(10). When at the end of the buffer such peek
> becomes complicated since we need to read more data and so we did, but didn't
> handle index properly essentially setting it back to the old value when the
> new one was set inside of the fill().
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)