scott-routledge2 commented on code in PR #48171:
URL: https://github.com/apache/arrow/pull/48171#discussion_r2585769433
##########
cpp/src/arrow/compute/kernels/scalar_cast_string.cc:
##########
@@ -304,8 +305,21 @@ BinaryToBinaryCastExec(KernelContext* ctx, const ExecSpan&
batch, ExecResult* ou
}
}
- // Start with a zero-copy cast, but change indices to expected size
- RETURN_NOT_OK(ZeroCopyCastExec(ctx, batch, out));
+ std::shared_ptr<ArrayData> input_arr = input.ToArrayData();
+ ArrayData* output = out->array_data().get();
+ output->length = input_arr->length;
+ output->SetNullCount(input_arr->null_count);
+ output->buffers = std::move(input_arr->buffers);
+ output->child_data = std::move(input_arr->child_data);
+
+ if (output->buffers[0]) {
+ // If reusing the null bitmap, ensure offset into the first byte is the
same as input.
+ output->offset = input_arr->offset % 8;
+ output->buffers[0] = SliceBuffer(output->buffers[0], input_arr->offset /
8);
+ } else {
+ output->offset = 0;
+ }
Review Comment:
Sorry for the late reply. What you are saying makes sense, however, I think
I am still a little confused about specifics of the slicing here. Wouldn't we
want to slice the offset buffer with a different value than the validity
buffer?
For example, if we are casting a slice that has length 8 and offset 8, we
would slice the
validity buffer with a value of 1, and be left with a buffer of length 1
representing the 8 null bits for the elements in our casted slice. We would
also slice the offsets buffer by 1, which would mean the buffer will have
length 17*offset_size - 1 and be out of alignment.
Similarly in the case where we have `output->null_count == 0` (and buffer[0]
!= nullptr), we would slice the offset buffer by 8, leaving a buffer of size
17*offset_size - 8 and we would also slice the validity buffer by 8 , which
would go out of bounds.
Wouldn't we want to slice the offsets buffer by `input_arr->offset*
sizeof(typename I::offset_type)` and the validity buffer by `input_arr->offset
/ 8`
Edit: I think the "offset" in SliceBuffer is the byte-offset as opposed to
the logical offset.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]