scott-routledge2 commented on code in PR #48171:
URL: https://github.com/apache/arrow/pull/48171#discussion_r2586541152


##########
cpp/src/arrow/compute/kernels/scalar_cast_string.cc:
##########
@@ -304,10 +305,33 @@ BinaryToBinaryCastExec(KernelContext* ctx, const 
ExecSpan& batch, ExecResult* ou
     }
   }
 
-  // Start with a zero-copy cast, but change indices to expected size
-  RETURN_NOT_OK(ZeroCopyCastExec(ctx, batch, out));
-  return CastBinaryToBinaryOffsets<typename I::offset_type, typename 
O::offset_type>(
-      ctx, input, out->array_data().get());
+  if constexpr (sizeof(typename I::offset_type) != sizeof(typename 
O::offset_type)) {
+    std::shared_ptr<ArrayData> input_arr = input.ToArrayData();
+    ArrayData* output = out->array_data().get();
+    output->length = input_arr->length;
+    // output->offset is set below
+    output->SetNullCount(input_arr->null_count);
+    output->buffers = std::move(input_arr->buffers);
+
+    // Slice buffers to reduce allocation when casting the offsets buffer
+    int64_t offset = input_arr->offset;
+    size_t input_offset_type_size = sizeof(typename I::offset_type);
+    if (output->null_count != 0 && output->buffers[0]) {
+      // Avoid reallocation of the validity buffer by allowing some padding 
bits
+      output->offset = input_arr->offset % 8;
+    } else {
+      output->offset = 0;
+    }
+    if (output->buffers[0]) {
+      output->buffers[0] = SliceBuffer(output->buffers[0], offset / 8);
+    }
+    output->buffers[1] = SliceBuffer(output->buffers[1], offset * 
input_offset_type_size);

Review Comment:
   I believe in order for this slicing logic to be fully correct it needs to 
take into account `output->offset` when doing the slicing as well as the 
minimum size requirements. So it would look something like
   
   ``` c++
   int64_t offset_buffer_offset = (input_offset - output_offset) * 
offset_type_size
   int64_t offset_buffer_length = (length + output_offset + 1) * 
offset_type_size
   if (offset_buffer_length < minimum_size) {
     // update the output->offset so that it's now == minimum size
   }
   ``` 
   ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to