adamreeve opened a new pull request, #41230:
URL: https://github.com/apache/arrow/pull/41230

   ### Rationale for this change
   
   This reduces file sizes when writing sliced binary or list arrays to IPC 
format.
   
   ### What changes are included in this PR?
   
   Changes `ArrowStreamWriter` to write only the subset of the values that is 
needed rather than the full value buffer when writing a `ListArray` or 
`BinaryArray`, and compute shifted value offset buffers.
   
   ### Are these changes tested?
   
   This code is covered by existing tests and the change doesn't introduce any 
difference in the observed array values, so I haven't added new tests or checks.
   
   I did change how list arrays are compared though as we can no longer compare 
the value and value offset buffers directly, so the tests now get list items as 
arrays and create a new `ArrayComparer` to compare them. This meant that array 
offsets are no longer always zero, so I've changed the offset assertions to 
only be used in strict mode.
   
   ### Are there any user-facing changes?
   
   Yes, this might reduce IPC file sizes for users writing sliced data.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to