mqy commented on a change in pull request #9137:
URL: https://github.com/apache/arrow/pull/9137#discussion_r555029409



##########
File path: rust/arrow/src/ipc/writer.rs
##########
@@ -712,15 +762,24 @@ fn write_buffer(
     buffers: &mut Vec<ipc::Buffer>,
     arrow_data: &mut Vec<u8>,
     offset: i64,
-) -> i64 {
-    let len = buffer.len();
+    codec: Option<&Codec>,
+) -> Result<i64> {
+    let mut compressed = Vec::new();
+    let buf_slice = match codec {
+        Some(codec) => {
+            codec.compress(buffer.as_slice(), &mut compressed)?;
+            compressed.as_slice()
+        }
+        None => buffer.as_slice(),
+    };
+    let len = buf_slice.len();
     let pad_len = pad_to_8(len as u32);
     let total_len: i64 = (len + pad_len) as i64;
     // assert_eq!(len % 8, 0, "Buffer width not a multiple of 8 bytes");
     buffers.push(ipc::Buffer::new(offset, total_len));

Review comment:
       @nevi-me perhaps the arg `total_len` of 
`buffers.push(ipc::Buffer::new(...)` should not account for `pad_len`. After 
fixing this possible bug, my test on pyarrow passed and no failure with `cargo 
test`, but I'm doubting if this is the root cause, because:
   - I changed quite a lot based on your PR.
   - This line was last updated in PR ARROW-518 a year ago.
   
   I took quite some time to compare with the C++ code, it looks c++ set buffer 
size as actual memory size.
   The [IPC spec 
recordbatch-message](https://github.com/apache/arrow/blob/master/docs/source/format/Columnar.rst#recordbatch-message)
 states that:
   
   `
   The size field of Buffer is not required to account for padding bytes. Since 
this metadata can be used to communicate in-memory pointer addresses between 
libraries, it is recommended to set size to the actual memory size rather than 
the padded size.
   `
   
   FYI: 
https://github.com/apache/arrow/commit/68a558be866927435abffb33ccb606aa6f1c9390 
at https://github.com/apache/arrow/compare/master...mqy:nevi-me_ARROW-8676 
based on your PR.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to