mqy commented on a change in pull request #9137:
URL: https://github.com/apache/arrow/pull/9137#discussion_r555029409
##########
File path: rust/arrow/src/ipc/writer.rs
##########
@@ -712,15 +762,24 @@ fn write_buffer(
buffers: &mut Vec<ipc::Buffer>,
arrow_data: &mut Vec<u8>,
offset: i64,
-) -> i64 {
- let len = buffer.len();
+ codec: Option<&Codec>,
+) -> Result<i64> {
+ let mut compressed = Vec::new();
+ let buf_slice = match codec {
+ Some(codec) => {
+ codec.compress(buffer.as_slice(), &mut compressed)?;
+ compressed.as_slice()
+ }
+ None => buffer.as_slice(),
+ };
+ let len = buf_slice.len();
let pad_len = pad_to_8(len as u32);
let total_len: i64 = (len + pad_len) as i64;
// assert_eq!(len % 8, 0, "Buffer width not a multiple of 8 bytes");
buffers.push(ipc::Buffer::new(offset, total_len));
Review comment:
@nevi-me perhaps the arg `total_len` of
`buffers.push(ipc::Buffer::new(...)` should not account for `pad_len`. After
fixing this possible bug, my test on pyarrow passed and no failure with `cargo
test`, but I'm doubting if this is the root cause, because:
- I changed quite a lot based on your PR.
- This line was last updated in PR ARROW-518 a year ago.
I took quite some time to compare with the C++ code, it looks c++ set buffer
size as actual memory size.
The [IPC spec
recordbatch-message](https://github.com/apache/arrow/blob/master/docs/source/format/Columnar.rst#recordbatch-message)
states that:
`
The size field of Buffer is not required to account for padding bytes. Since
this metadata can be used to communicate in-memory pointer addresses between
libraries, it is recommended to set size to the actual memory size rather than
the padded size.
`
FYI:
https://github.com/apache/arrow/commit/68a558be866927435abffb33ccb606aa6f1c9390
at https://github.com/apache/arrow/compare/master...mqy:nevi-me_ARROW-8676
based on your PR.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]