adamreeve opened a new pull request, #41197:
URL: https://github.com/apache/arrow/pull/41197

   ### Rationale for this change
   
   Fixes writing sliced arrays to IPC files or streams, so that they can be 
successfully read back in. Previously, writing such data would succeed but then 
couldn't be read.
   
   ### What changes are included in this PR?
   
   * Fixes `BinaryViewArray.GetBytes` to account for the array offset
   * Fixes `FixedSizeBinaryArray.GetBytes` to account for the array offset
   * Updates `ArrowStreamWriter` so that it writes slices of buffers when 
required, and handles slicing bitmap arrays by creating a copy if the offset 
isn't a multiple of 8
   * Refactors `ArrowStreamWriter`, making the 
`ArrowRecordBatchFlatBufferBuilder` class responsible for building a list of 
field nodes as well as buffers. This was required to avoid having to duplicate 
logic for handling array types with child data between the 
`ArrowRecordBatchFlatBufferBuilder` class and the 
`CreateSelfAndChildrenFieldNodes` method, which I've removed.
   
   Note that after this change, we still write more data than required when 
writing a slice of a `ListArray`, `BinaryArray`, `ListViewArray`, 
`BinaryViewArray` or `DenseUnionArray`. When writing a `ListArray` for example, 
we write slices of the null bitmap and value offsets and write the full values 
array. Ideally we should write a slice of the values and adjust the value 
offsets so they start at zero. The C++ implementation for example handles this 
[here](https://github.com/apache/arrow/blob/18c74b0733c9ff473a211259cf10705b2c9be891/cpp/src/arrow/ipc/writer.cc#L316).
 I will make a follow-up issue for this once this PR is merged.
   
   ### Are these changes tested?
   
   Yes, I've added new unit tests for this.
   
   ### Are there any user-facing changes?
   
   Yes, this is a user-facing bug fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to