[ https://issues.apache.org/jira/browse/NIFI-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871981#comment-15871981 ]
ASF GitHub Bot commented on NIFI-3495: -------------------------------------- Github user asfgit closed the pull request at: https://github.com/apache/nifi/pull/1518 > TextLineDemarcator sets the wrong index when read ahead is performed in isEol > operation > --------------------------------------------------------------------------------------- > > Key: NIFI-3495 > URL: https://issues.apache.org/jira/browse/NIFI-3495 > Project: Apache NiFi > Issue Type: Bug > Reporter: Oleg Zhurakousky > Assignee: Oleg Zhurakousky > Priority: Critical > Fix For: 1.2.0 > > > This condition is very rare. It only occurs when read ahead (call to > _fill()_) is made inside of the _isEol_ operation which essentially sets the > new index which then is reset inside of the main _nextOffsetInfo_ operation. > So the fix is to basically monitor if _isEol_ had to perform read ahead and > if it did do not reset the index. > More details. > While this component is modeled after standard Java BufferedReader which > simply reads and returns lines (delimited by CR or LF or both), this reader > also holds the information about how each line terminated (i.e., EOF, or CR > or LF or CR and LF) returning it to the caller as OffsetInfo. > So for example if you have a record "foo\r\nbar" and you read it with > BuffereReader you will get 'foo' and 'bar'. However you will not know that > between the two tokens there was CR and LF and therefore will not be able to > restore (if need to) the record to its original state. The TextLineDemarcator > will return OffsetInfo which holds the delimiter and other information. > So, to accomplish the above every time we see CR (13) we need to peek at the > next byte and see if its LF(10). When at the end of the buffer such peek > becomes complicated since we need to read more data and so we did, but didn't > handle index properly essentially setting it back to the old value when the > new one was set inside of the fill(). -- This message was sent by Atlassian JIRA (v6.3.15#6346)