zeroshade commented on issue #48883:
URL: https://github.com/apache/arrow/issues/48883#issuecomment-3769187569

   Many (possibly most) implementations of Arrow will pad the buffers to a 
particular alignment for vectorization reasons, commonly 32 or 64 bits. And 
they don't always *fully* truncate the buffers when writing the IPC files. 
Given the extensive integration tests that we do for IPC compatibility between 
implementations I would say that if the current implementations accept option 
(b), then that is the standard we should allow, which currently seems to be the 
case. 
   
   Particularly looking at the linked polars issue: If PyArrow can read the ipc 
file without issue, that means the C++ implementation allows it, which means 
that all of the implementations which are based on it allow it (R, ruby, 
gobject, etc...). In particularly, given the current IPC integration tests 
don't fail on the ipc files generated by arrow-go, I would wager that all the 
major implementations allow for the case where the uncompressed buffer is 
larger than it might necessarily have to be. 
   
   While it's likely a good suggestion for me to update the Go implementation 
to better truncate the validity bitmaps, I would argue that polars should allow 
the IPC files generated since by all accounts, they seem to be considered valid 
IPC files. That said, it would also be equally valid for polars to only 
utilize/reference the necessary bytes. e.g., in the case of 5 rows, if the file 
says the uncompressed size is 4 bytes (because of the padding) it would be 
perfectly valid for the actual buffer that polars uses to have a length of 1 
byte, and polars just ignores the extra 3 bytes (which should all be zeroed 
anyways)
   
   disclaimer: I'm the primary developer/maintainer of the Go implementation


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to