fenfeng9 opened a new pull request, #50166:
URL: https://github.com/apache/arrow/pull/50166
### Rationale for this change
Casting to `binary_view` or `string_view` could leave a null variadic
buffer slot when all values were inline. This could happen for casts from
`binary`, `large_binary`, `string`, `large_string`, and `fixed_size_binary`.
The C Data Interface exporter reads every variadic buffer to get its size.
Because of that, exporting such an array could crash, for example through
PyArrow `_export_to_c`.
Validation also passed for these arrays. For all-inline view arrays,
validation never needed to read an out-of-line data buffer.
### What changes are included in this PR?
This PR fixes the cast kernels so all-inline view arrays do not keep a null
variadic buffer slot.
It also makes validation reject null variadic buffer slots, and makes C Data
export return an error instead of crashing.
C++ and Python regression tests cover the cast, validation, and export paths.
### Are these changes tested?
Yes.
### Are there any user-facing changes?
No.
**This PR contains a "Critical Fix"** Exporting an all-inline view array
through the C Data Interface could crash the process while using only public
APIs.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]