scott-routledge2 commented on code in PR #48171:
URL: https://github.com/apache/arrow/pull/48171#discussion_r2604745630
##########
cpp/src/arrow/compute/kernels/scalar_cast_string.cc:
##########
@@ -304,10 +305,33 @@ BinaryToBinaryCastExec(KernelContext* ctx, const
ExecSpan& batch, ExecResult* ou
}
}
- // Start with a zero-copy cast, but change indices to expected size
- RETURN_NOT_OK(ZeroCopyCastExec(ctx, batch, out));
- return CastBinaryToBinaryOffsets<typename I::offset_type, typename
O::offset_type>(
- ctx, input, out->array_data().get());
+ if constexpr (sizeof(typename I::offset_type) != sizeof(typename
O::offset_type)) {
+ std::shared_ptr<ArrayData> input_arr = input.ToArrayData();
+ ArrayData* output = out->array_data().get();
+ output->length = input_arr->length;
+ // output->offset is set below
+ output->SetNullCount(input_arr->null_count);
+ output->buffers = std::move(input_arr->buffers);
+
+ // Slice buffers to reduce allocation when casting the offsets buffer
+ int64_t offset = input_arr->offset;
+ size_t input_offset_type_size = sizeof(typename I::offset_type);
+ if (output->null_count != 0 && output->buffers[0]) {
+ // Avoid reallocation of the validity buffer by allowing some padding
bits
+ output->offset = input_arr->offset % 8;
+ } else {
+ output->offset = 0;
+ }
+ if (output->buffers[0]) {
+ output->buffers[0] = SliceBuffer(output->buffers[0], offset / 8);
+ }
+ output->buffers[1] = SliceBuffer(output->buffers[1], offset *
input_offset_type_size);
Review Comment:
When I was testing the buffer slicing logic here (for example, casting a
slice with length 3 and offset 1) I was seeing an error that looked like:
```
'datum.make_array()->ValidateFull()' failed with Invalid: Offsets buffer
size (bytes): 16 isn't large enough for length: 3 and offset: 1
```
Since I think technically the offsets buffer needs to be big enough to hold
`offset + length + 1` elements whereas the way this is written it will only
hold `length + 1` many elements. However, because of the code structure here,
this path isn't actually reachable since we always reallocate the offsets
buffer with the correct size in `CastBinaryToBinaryOffsets` anyways.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]