[GitHub] [iceberg] rdblue commented on a diff in pull request #6532: Python: Fix reading 0-bytes binary

GitBox Sun, 08 Jan 2023 10:57:00 -0800


rdblue commented on code in PR #6532:
URL: https://github.com/apache/iceberg/pull/6532#discussion_r1064187384



##########
python/pyiceberg/avro/decoder.py:
##########
@@ -48,13 +48,18 @@ def read(self, n: int) -> bytes:
         """
         if n < 0:
             raise ValueError(f"Requested {n} bytes to read, expected positive 
integer.")
-        read_bytes = self._input_stream.read(n)
-        read_len = len(read_bytes)
-        if read_len <= 0:
-            raise EOFError
-        elif read_len != n:
-            raise ValueError(f"Read {len(read_bytes)} bytes, expected {n} 
bytes")
-        return read_bytes
+        data = b""
+
+        n_remaining = n
+        while n_remaining > 0:
+            data_read = self._input_stream.read(n_remaining)
+            read_len = len(data_read)
+            if read_len <= 0:
+                raise EOFError(f"Got negative length: {read_len}")
+            data += data_read

Review Comment:
   Looks like this will reallocate and copy, according to [this post on 
efficient byte 
operations](https://www.guyrutenberg.com/2020/04/04/fast-bytes-concatenation-in-python/).
 From the post, it looks like we should either keep a list of read buffers and 
call join at the end, or use `bytearray` to accumulate data and convert at the 
end.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a diff in pull request #6532: Python: Fix reading 0-bytes binary

Reply via email to