Copilot commented on code in PR #49443:
URL: https://github.com/apache/arrow/pull/49443#discussion_r2902773597


##########
cpp/src/arrow/compute/kernels/scalar_if_else.cc:
##########
@@ -783,11 +794,22 @@ struct IfElseFunctor<Type, enable_if_base_binary<Type>> {
       auto* out_data = out->array_data().get();
       auto offset_length = (cond.length + 1) * sizeof(OffsetType);
       ARROW_ASSIGN_OR_RAISE(out_data->buffers[1], 
ctx->Allocate(offset_length));
-      std::memcpy(out_data->buffers[1]->mutable_data(), left_offsets, 
offset_length);
+
+      if (left.offset == 0) {
+        std::memcpy(out_data->buffers[1]->mutable_data(), left_offsets, 
offset_length);
+      } else {
+        OffsetType base = left_offsets[0];
+        auto* out_offsets =
+            
reinterpret_cast<OffsetType*>(out_data->buffers[1]->mutable_data());
+        for (int64_t i = 0; i <= cond.length; ++i) {
+          out_offsets[i] = left_offsets[i] - base;
+        }

Review Comment:
   Same issue as ASA path: using `left.offset == 0` to decide whether to memcpy 
offsets can produce invalid output if `left.offset == 0` but `left_offsets[0] 
!= 0`. Since the data is copied starting at `left_data + left_offsets[0]`, the 
output offsets should be normalized whenever `left_offsets[0]` is non-zero (or 
unconditionally), not only when the array has a non-zero element offset.



##########
cpp/src/arrow/compute/kernels/scalar_if_else.cc:
##########
@@ -743,11 +743,22 @@ struct IfElseFunctor<Type, enable_if_base_binary<Type>> {
       auto* out_data = out->array_data().get();
       auto offset_length = (cond.length + 1) * sizeof(OffsetType);
       ARROW_ASSIGN_OR_RAISE(out_data->buffers[1], 
ctx->Allocate(offset_length));
-      std::memcpy(out_data->buffers[1]->mutable_data(), right_offsets, 
offset_length);
+
+      if (right.offset == 0) {
+        std::memcpy(out_data->buffers[1]->mutable_data(), right_offsets, 
offset_length);
+      } else {
+        OffsetType base = right_offsets[0];
+        auto* out_offsets =
+            
reinterpret_cast<OffsetType*>(out_data->buffers[1]->mutable_data());
+        for (int64_t i = 0; i <= cond.length; ++i) {
+          out_offsets[i] = right_offsets[i] - base;
+        }

Review Comment:
   The decision to memcpy offsets based on `right.offset == 0` is insufficient. 
Binary arrays can be valid with `offset == 0` but `right_offsets[0] != 0` 
(ValidateFull only requires bounds/monotonicity). In that case this branch 
would copy non-zero-based offsets while the data copy starts at `right_data + 
right_offsets[0]`, producing invalid offsets relative to the copied data. 
Consider keying this optimization off `right_offsets[0] == 0` (or always 
normalizing by subtracting `right_offsets[0]`) so the output offsets are 
consistent with the copied data in all valid inputs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to