nikie edited a comment on pull request #15775:
URL: https://github.com/apache/beam/pull/15775#issuecomment-950366215


   There is another issue, related to bundles splitting, which needs to be 
addressed.
   This code in `read_records` method 
https://github.com/apache/beam/blob/6173abb2e241865d89dff9ae679f4d422be84ee9/sdks/python/apache_beam/io/textio.py#L169-L183
 given a desired `start_offset`, provided by `range_tracker`, determines the 
start of next line from the `start_offset`, looking back 1 character to 
determine if the line starts exactly at `start_offset` position. This assumes 
that line separator always ends with `\n` (the default delimiter).
   
   Here is similar `TextSource` code in Java SDK for comparison - it takes into 
account custom delimiter:
   
https://github.com/apache/beam/blob/6173abb2e241865d89dff9ae679f4d422be84ee9/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextSource.java#L149-L160


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to