Rich-T-kid commented on PR #10206:
URL: https://github.com/apache/arrow-rs/pull/10206#issuecomment-4785443780

   Nice, decode path shows a large performance boost. I think the largest 
performance boost is finding a way to align the buffers that are being decoded 
without needing to copy every single buffer.
   
   This is difficult because of the lack of control we have over the byte 
offset that is handed to us by tonic. looking at 
   ```
   pub fn align_buffers(&mut self) {
           let layout = layout(&self.data_type);
           for (buffer, spec) in self.buffers.iter_mut().zip(&layout.buffers) {
               if let BufferSpec::FixedWidth { alignment, .. } = spec {
                   if buffer.as_ptr().align_offset(*alignment) != 0 {
                       *buffer = Buffer::from_slice_ref(buffer.as_ref());
                   }
               }
           }
           // align children data recursively
           for data in self.child_data.iter_mut() {
               data.align_buffers()
           }
       }
   ```
   it seems that the math for checking if the buffer is aligned just comes down 
to `ptr_address % alignment (64)` if this isnt 0 it copys the buffer into a new 
buffer that is aligned correctly. 
[MutableBuffer](https://docs.rs/arrow/latest/arrow/buffer/struct.MutableBuffer.html)
 "Buffers created from MutableBuffer (via into) are guaranteed to be aligned 
along cache lines and in multiples of 64 bytes."
   
   From my understanding arrow-ipc adds padding within the IPC body to account 
for alignment.
   <img width="2098" height="1176" alt="Image 6-23-26 at 10 43 PM" 
src="https://github.com/user-attachments/assets/5f520fb7-00f6-4780-adb1-17cabccbbb73";
 />
   
   The issue occurs when the initial buffer returned by Tonic starts at a 
misaligned byte offset. For example, if the buffer starts at byte offset 253, 
it isn't aligned to a 32 or 64-byte boundary. Because Arrow IPC's internal 
padding is relative to the start of the buffer, a misaligned starting offset 
means all the sub-buffers inside it are also misaligned, the padding does 
nothing to fix this. The result is that every sub-buffer has to be copied into 
a fresh, correctly aligned allocation anyway.
   Since alignment is determined by where the overall buffer starts, the fix 
should happen upfront: check whether the buffer Tonic returns is aligned before 
beginning to decode it, and if not, reallocate it once at that point rather 
than discovering the problem per sub-buffer during decoding.
   
   This would avoid N buffer copies in exchange for 1 large buffer copy at the 
start.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to