bkietz commented on code in PR #5525:
URL: https://github.com/apache/arrow-rs/pull/5525#discussion_r1532546708
##########
arrow-ipc/src/writer.rs:
##########
@@ -1247,6 +1315,17 @@ fn write_array_data(
compression_codec,
)?;
}
+ } else if matches!(data_type, DataType::BinaryView | DataType::Utf8View) {
+ // Todo: if the array has a offset, we should slice the ArrayData to
save space.
Review Comment:
To be clear, a view type array has no offsets buffer but it might still have
an offset. Slicing the views buffer is safe and easy, but pruning unneeded data
buffers is much more nuanced since it's complicated to prove that no views
reference the pruned buffers
##########
arrow-ipc/src/writer.rs:
##########
@@ -547,6 +564,57 @@ impl IpcDataGenerator {
}
}
+fn set_variadic_buffer_counts(counts: &mut Vec<i64>, array: &dyn Array) {
+ match array.data_type() {
+ DataType::BinaryView | DataType::Utf8View => {
+ // The spec is not clear on whether the view/null buffer should be
included in the variadic buffer count.
+ // But from C++ impl
https://github.com/apache/arrow/blob/b448b33808f2dd42866195fa4bb44198e2fc26b9/cpp/src/arrow/ipc/writer.cc#L477
+ // we know they are not included.
Review Comment:
FWIW, the expected values of buffer counts are outlined here in an example
https://github.com/bkietz/arrow/blob/46758bc3c6321d8b7013acf52bff7761a8b33eda/docs/source/format/Columnar.rst?plain=1#L1274-L1278
It is correct that only the variadic buffers are counted in
VariadicBufferCounts.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]