[ 
https://issues.apache.org/jira/browse/NIFI-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oleg Zhurakousky updated NIFI-3495:
-----------------------------------
    Description: 
This condition is very rare. It only occurs when read ahead (call to _fill()_)  
is made inside of the _isEol_ operation which essentially sets the new index 
which then is reset inside of the main _nextOffsetInfo_ operation. 
So the fix is to basically monitor if _isEol_ had to perform read ahead and if 
it did do not reset the index.

More details.
While this component is modeled after standard Java BufferedReader which simply 
reads and returns lines (delimited by CR or LF or both), this reader also holds 
the information about how each line terminated (i.e., EOF, or CR or LF or CR 
and LF) returning it to the caller as OffsetInfo. 
So for example if you have a record "foo\r\nbar" and you read it with 
BuffereReader you will get 'foo' and 'bar'. However you will not know that 
between the two tokens there was CR and LF and therefore will not be able to 
restore (if need to) the record to its original state. The TextLineDemarcator 
will return OffsetInfo which holds the delimiter and other information.

So, to accomplish the above every time we see CR (13) we need to peek at the 
next byte and see if its LF(10). When at the end of the buffer such peek 
becomes complicated since we need to read more data and so we did, but didn't 
handle index properly essentially setting it back to the old value when the new 
one was set inside of the fill().

  was:
This condition is very rare. It only occurs when read ahead (call to _fill()_)  
is made inside of the _isEol_ operation which essentially sets the new index 
which then is reset inside of the main _nextOffsetInfo_ operation. 
So the fix is to basically monitor if _isEol_ had to perform read ahead and if 
it did do not reset the index.


> TextLineDemarcator sets the wrong index when read ahead is performed in isEol 
> operation
> ---------------------------------------------------------------------------------------
>
>                 Key: NIFI-3495
>                 URL: https://issues.apache.org/jira/browse/NIFI-3495
>             Project: Apache NiFi
>          Issue Type: Bug
>            Reporter: Oleg Zhurakousky
>            Assignee: Oleg Zhurakousky
>            Priority: Critical
>             Fix For: 1.2.0
>
>
> This condition is very rare. It only occurs when read ahead (call to 
> _fill()_)  is made inside of the _isEol_ operation which essentially sets the 
> new index which then is reset inside of the main _nextOffsetInfo_ operation. 
> So the fix is to basically monitor if _isEol_ had to perform read ahead and 
> if it did do not reset the index.
> More details.
> While this component is modeled after standard Java BufferedReader which 
> simply reads and returns lines (delimited by CR or LF or both), this reader 
> also holds the information about how each line terminated (i.e., EOF, or CR 
> or LF or CR and LF) returning it to the caller as OffsetInfo. 
> So for example if you have a record "foo\r\nbar" and you read it with 
> BuffereReader you will get 'foo' and 'bar'. However you will not know that 
> between the two tokens there was CR and LF and therefore will not be able to 
> restore (if need to) the record to its original state. The TextLineDemarcator 
> will return OffsetInfo which holds the delimiter and other information.
> So, to accomplish the above every time we see CR (13) we need to peek at the 
> next byte and see if its LF(10). When at the end of the buffer such peek 
> becomes complicated since we need to read more data and so we did, but didn't 
> handle index properly essentially setting it back to the old value when the 
> new one was set inside of the fill().



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to