nikie commented on a change in pull request #15901: URL: https://github.com/apache/beam/pull/15901#discussion_r746062495
########## File path: sdks/python/apache_beam/io/textio.py ########## @@ -362,6 +391,15 @@ def _is_self_overlapping(delimiter): return True return False + def _is_escaped(self, read_buffer, position): + # Returns True if byte at position is preceded with an odd number + # of escapechar bytes or False if preceded by 0 or even escapes + # (the even number means that all the escapes are escaped themselves). + for current_pos in reversed(range(-1, position)): + if read_buffer.data[current_pos:current_pos + 1] != self._escapechar: Review comment: Totally agree about the trickiness. But it is beautiful and works :) I will look for more explicit solution if you insist, though. 1. -1 is needed to check if the 1st character is escaped without additional check if it is the first or not and be able to return from the same if block if we reached the start of buffer, -1:0 range returns empty bytes which cannot be equal to escapechar as we do not allow empty one: ``` >>> b'\\'[-1:0] b'' >>> b'\\'[-1:0] == b'\\' False ``` 2. No, we cannot compare `byte` to `bytes` (I was a bit surprised as well, but we are doing similar comparisons in `_find_separator_bounds`): ``` >>> b'a'[0] == b'a' False >>> b'a'[0:1] == b'a' True ``` "why not to count preceding_escape_chars" - we are kind of counting, just using the loop variable for this (by the time we reached non-escapechar, it contains the number of consecutive escapechars) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org