scott-routledge2 commented on code in PR #48171:
URL: https://github.com/apache/arrow/pull/48171#discussion_r2586541152
##########
cpp/src/arrow/compute/kernels/scalar_cast_string.cc:
##########
@@ -304,10 +305,33 @@ BinaryToBinaryCastExec(KernelContext* ctx, const
ExecSpan& batch, ExecResult* ou
}
}
- // Start with a zero-copy cast, but change indices to expected size
- RETURN_NOT_OK(ZeroCopyCastExec(ctx, batch, out));
- return CastBinaryToBinaryOffsets<typename I::offset_type, typename
O::offset_type>(
- ctx, input, out->array_data().get());
+ if constexpr (sizeof(typename I::offset_type) != sizeof(typename
O::offset_type)) {
+ std::shared_ptr<ArrayData> input_arr = input.ToArrayData();
+ ArrayData* output = out->array_data().get();
+ output->length = input_arr->length;
+ // output->offset is set below
+ output->SetNullCount(input_arr->null_count);
+ output->buffers = std::move(input_arr->buffers);
+
+ // Slice buffers to reduce allocation when casting the offsets buffer
+ int64_t offset = input_arr->offset;
+ size_t input_offset_type_size = sizeof(typename I::offset_type);
+ if (output->null_count != 0 && output->buffers[0]) {
+ // Avoid reallocation of the validity buffer by allowing some padding
bits
+ output->offset = input_arr->offset % 8;
+ } else {
+ output->offset = 0;
+ }
+ if (output->buffers[0]) {
+ output->buffers[0] = SliceBuffer(output->buffers[0], offset / 8);
+ }
+ output->buffers[1] = SliceBuffer(output->buffers[1], offset *
input_offset_type_size);
Review Comment:
I believe in order for this slicing logic to be fully correct it needs to
take into account `output->offset` when doing the slicing as well as the
minimum size requirements. So it would look something like
``` c++
int64_t offset_buffer_offset = (input_offset - output_offset) *
offset_type_size
int64_t offset_buffer_length = (length + output_offset + 1) *
offset_type_size
if (offset_buffer_length < minimum_size) {
// update the output->offset so that it's now == minimum size
}
```
?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]