nikie edited a comment on pull request #15775: URL: https://github.com/apache/beam/pull/15775#issuecomment-950366215
There is another issue, related to bundles splitting, which needs to be addressed. This code in `split_records` method https://github.com/apache/beam/blob/6173abb2e241865d89dff9ae679f4d422be84ee9/sdks/python/apache_beam/io/textio.py#L169-L183 given a desired `start_offset`, provided by `range_tracker`, determines the start of next line from the `start_offset`, looking back 1 character to determine if the line starts exactly at `start_offset` position. This assumes that line separator always ends with `\n` (the default delimiter). Here is similar `TextSource` code in Java SDK for comparison - it takes into account custom delimiter: https://github.com/apache/beam/blob/6173abb2e241865d89dff9ae679f4d422be84ee9/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextSource.java#L149-L160 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
