pitrou commented on code in PR #37526:
URL: https://github.com/apache/arrow/pull/37526#discussion_r1323057517
##########
dev/archery/archery/integration/datagen.py:
##########
@@ -743,6 +763,82 @@ class LargeStringColumn(_BaseStringColumn,
_LargeOffsetsMixin):
pass
+class BinaryViewColumn(PrimitiveColumn):
+
+ def _encode_value(self, x):
+ return frombytes(binascii.hexlify(x).upper())
+
+ def _get_buffers(self):
+ char_buffers = []
+ # a small default char buffer size is used so we get multiple
+ # character buffers without massive arrays
+ DEFAULT_BUFFER_SIZE = 32
+ INLINE_SIZE = 12
+
+ data = []
+ for i, v in enumerate(self.values):
+ if not self.is_valid[i]:
+ v = b''
+ assert isinstance(v, bytes)
+
+ if len(v) > INLINE_SIZE:
+ offset = 0
+ if len(v) > DEFAULT_BUFFER_SIZE:
Review Comment:
This is probably redundant with the third condition below
(`len(char_buffers[-1]) + len(v) > DEFAULT_BUFFER_SIZE`)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]