Jessica Wise created BEAM-10170:
-----------------------------------
Summary: TextBasedReader may not respect the source end offset
Key: BEAM-10170
URL: https://issues.apache.org/jira/browse/BEAM-10170
Project: Beam
Issue Type: Bug
Components: beam-model
Reporter: Jessica Wise
[TextBasedReader|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextSource.java#L88]
is backed by a TextSource, which may have a start and end offset. If the end
offset does not correspond to a delimiter, the TextBasedReader will not respect
the end offset and will instead read past the end offset to the next instance
of a delimiter. See
[TextBasedReader#findDelimiterBounds|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextSource.java#L176]
which finds the end of the next record to read: this method will "consume the
channel till either EOF or the delimiter bounds are found." I believe this is
a bug because this method should also check for the end offset, not just EOF or
a delimiter.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)