nikie commented on pull request #15775:
URL: https://github.com/apache/beam/pull/15775#issuecomment-955011357


   > > Thanks for the snippet. The case i was concerned about is where we use a 
default separator (evaluates to \n), but split between '\r\n', not a custom 
separator that is also \r\n. I checked that it also works, since we just extend 
the buffer, and then still look back 1 character when we encounter \n.
   > 
   > Let's make a test case out of your code snippet, if there isn't one that 
covers this scenario. Thanks.
   
   Nice catch, @tvalentyn, but it works! :)
   Here is the test which succeeds:
   ```
     def test_read_crlf_split_by_buffer(self):
       file_name, expected_data = write_data(3, eol=EOL.CRLF)
       assert len(expected_data) == 3
       self._run_read_test(
           file_name, expected_data, buffer_size=6)
   ```
   This is because buffer is not discarded at the end of it:
   * at the end of the 1st iteration we have `b'line0\r'` in the buffer
   * in the beginning of next iteration 
`self._try_to_ensure_num_bytes_in_buffer` extends the buffer to 
`b'line0\r\nline1'`
   * `b'\n'` is found as possible next separator and back check will pick up 
the `r'\r'`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to