zanmato1984 commented on code in PR #46124:
URL: https://github.com/apache/arrow/pull/46124#discussion_r2045208022


##########
cpp/src/arrow/compute/row/compare_internal.cc:
##########
@@ -276,12 +276,13 @@ void KeyCompare::CompareVarBinaryColumnToRowHelper(
       int32_t tail_length = length - j * 8;
       uint64_t tail_mask = ~0ULL >> (64 - 8 * tail_length);
       uint64_t key_left = 0;
-      std::memcpy(&key_left, key_left_ptr + j, tail_length);
+      const uint8_t* src_bytes = reinterpret_cast<const uint8_t*>(key_left_ptr 
+ j);
+      std::memcpy(&key_left, src_bytes, tail_length);
       uint64_t key_right = key_right_ptr[j];
       result_or |= tail_mask & (key_left ^ key_right);
     }
     int result = result_or == 0 ? 0xff : 0;
-    result *= (length_left == length_right ? 1 : 0);

Review Comment:
   From the error message in the issue:
   ```
   
/Users/ripley/R/packages/tests-SAN/arrow/tools/cpp/src/arrow/compute/row/compare_internal.cc:284:30:
 runtime error: load of misaligned address 0x000150040c01 for type 'const 
uint64_t *' (aka 'const unsigned long long *'), which requires 8 byte alignment
   ```
   
   How does this LOC issue a misaligned load? Compiler reordering? I assume the 
actual code in question is the `std::memcpy` above right?



##########
cpp/src/arrow/compute/row/compare_internal.cc:
##########
@@ -276,12 +276,13 @@ void KeyCompare::CompareVarBinaryColumnToRowHelper(
       int32_t tail_length = length - j * 8;
       uint64_t tail_mask = ~0ULL >> (64 - 8 * tail_length);
       uint64_t key_left = 0;
-      std::memcpy(&key_left, key_left_ptr + j, tail_length);
+      const uint8_t* src_bytes = reinterpret_cast<const uint8_t*>(key_left_ptr 
+ j);
+      std::memcpy(&key_left, src_bytes, tail_length);
       uint64_t key_right = key_right_ptr[j];
       result_or |= tail_mask & (key_left ^ key_right);
     }
     int result = result_or == 0 ? 0xff : 0;
-    result *= (length_left == length_right ? 1 : 0);

Review Comment:
   I'm also curious about that, if the problem is in `std::memcpy`, then why 
does the pointer type (`uint64_t *` vs `uint8_t *`) matter given that 
`std::memcpy` accepts `void *`.



##########
cpp/src/arrow/compute/light_array_internal.cc:
##########
@@ -615,7 +615,9 @@ Status ExecBatchBuilder::AppendSelected(const 
std::shared_ptr<ArrayData>& source
                 target->mutable_data(2) +
                 offsets[num_rows_before + num_rows_to_process + i]);
             const uint64_t* src = reinterpret_cast<const uint64_t*>(ptr);
-            memcpy(dst, src, num_bytes);
+            uint8_t* dst_bytes = reinterpret_cast<uint8_t*>(dst);
+            const uint8_t* src_bytes = reinterpret_cast<const uint8_t*>(src);
+            memcpy(dst_bytes, src_bytes, num_bytes);

Review Comment:
   I think we can instead use `dst` w/o `reinterprete_cast`ing to `uint64_t*` 
and `ptr` (as source):
   ```
   auto dst = target->mutable_data(2) + offsets[num_rows_before + 
num_rows_to_process + i];
   memcpy(dst, ptr, num_bytes);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to