nikie commented on a change in pull request #15901:
URL: https://github.com/apache/beam/pull/15901#discussion_r746062495



##########
File path: sdks/python/apache_beam/io/textio.py
##########
@@ -362,6 +391,15 @@ def _is_self_overlapping(delimiter):
         return True
     return False
 
+  def _is_escaped(self, read_buffer, position):
+    # Returns True if byte at position is preceded with an odd number
+    # of escapechar bytes or False if preceded by 0 or even escapes
+    # (the even number means that all the escapes are escaped themselves).
+    for current_pos in reversed(range(-1, position)):
+      if read_buffer.data[current_pos:current_pos + 1] != self._escapechar:

Review comment:
       Totally agree about the trickiness. But it is beautiful and works :) 
   I will look for more explicit solution if you insist, though.
   
   1. -1 is needed to check if the 1st character is escaped without additional 
check if it is the first or not and be able to return from the same if block if 
we reached the start of buffer, -1:0 range returns empty bytes which cannot be 
equal to escapechar as we do not allow empty one:
   ```
   >>> b'\\'[-1:0]
   b''
   >>> b'\\'[-1:0] == b'\\'
   False
   ```
   2. No, we cannot compare `byte` to `bytes` (I was a bit surprised as well, 
but we are doing similar comparisons in `_find_separator_bounds`):
   ```
   >>> b'a'[0] == b'a'
   False
   >>> b'a'[0:1] == b'a'
   True
   ```
   "why not to count preceding_escape_chars" - we are kind of counting, just 
using the loop variable for this (by the time we reached non-escapechar, it 
contains the number of consecutive escapechars)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to