felipecrv commented on code in PR #48171:
URL: https://github.com/apache/arrow/pull/48171#discussion_r2543142180


##########
cpp/src/arrow/compute/kernels/scalar_cast_string.cc:
##########
@@ -304,8 +305,21 @@ BinaryToBinaryCastExec(KernelContext* ctx, const ExecSpan& 
batch, ExecResult* ou
     }
   }
 
-  // Start with a zero-copy cast, but change indices to expected size
-  RETURN_NOT_OK(ZeroCopyCastExec(ctx, batch, out));
+  std::shared_ptr<ArrayData> input_arr = input.ToArrayData();
+  ArrayData* output = out->array_data().get();
+  output->length = input_arr->length;
+  output->SetNullCount(input_arr->null_count);
+  output->buffers = std::move(input_arr->buffers);
+  output->child_data = std::move(input_arr->child_data);
+
+  if (output->buffers[0]) {
+    // If reusing the null bitmap, ensure offset into the first byte is the 
same as input.
+    output->offset = input_arr->offset % 8;
+    output->buffers[0] = SliceBuffer(output->buffers[0], input_arr->offset / 
8);
+  } else {
+    output->offset = 0;
+  }
+

Review Comment:
   We need to make sure all cases are covered correctly here. 
`enable_if_t<is_base_binary_type<I>::value && is_base_binary_type<O>::value, 
Status>` covers all pairs of `Binary/LargeBinary/String/LargeString`.
   
   `if constexpr (!I::is_utf8 && O::is_utf8) {` handles the UTF8 validation in 
case we are going from `{Binary, LargeBinary}` to `{String, LargeString}`.
   
   The allocation issue is only a problem when going from 32 to 64 or 64 to 32, 
so you can shape your new code like this:
   
   ```cpp
   if constexpr (sizeof(typename I::offset_type) != sizeof(typename 
O::offset_type)) {
     std::shared_ptr<ArrayData> input_arr = input.ToArrayData();
     ArrayData* output = out->array_data().get();
     // ...
     if (output->buffers[0]) {
       ...
   
   
     return CastBinaryToBinaryOffsets<typename I::offset_type, typename 
O::offset_type>(
           ctx, input, out->array_data().get()); 
   } else {
     return ZeroCopyCastExec(ctx, batch, out);
   }
   ```
   ----
   
   Useful diagram I made some time ago: 
https://gist.github.com/felipecrv/3c02f3784221d946dec1b031c6d400db



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to