Re: [PR] GH-48177: [C++][Parquet] Fix arrow-acero-asof-join-node-test failures on s390x [arrow]

via GitHub Fri, 05 Dec 2025 02:09:53 -0800


zanmato1984 commented on code in PR #48180:
URL: https://github.com/apache/arrow/pull/48180#discussion_r2592121965



##########
cpp/src/arrow/compute/util.cc:
##########
@@ -118,7 +124,22 @@ void bits_to_indexes_internal(int64_t hardware_flags, 
const int num_bits,
   // Optionally process the last partial word with masking out bits outside 
range
   if (tail) {
     const uint8_t* bits_tail = bits + (num_bits - tail) / 8;
+#if ARROW_LITTLE_ENDIAN
     uint64_t word = SafeLoadUpTo8Bytes(bits_tail, (tail + 7) / 8);
+#else
+    int tail_bytes = (tail + 7) / 8;
+    uint64_t word;
+    if (tail_bytes == 8) {
+      word = util::SafeLoad(reinterpret_cast<const uint64_t*>(bits_tail));
+    } else {
+      // For bit manipulation, always load into least significant bits
+      // to ensure compatibility with CountTrailingZeros on Big-endian systems
+      word = 0;
+      for (int i = 0; i < tail_bytes; ++i) {
+        word |= static_cast<uint64_t>(bits_tail[i]) << (8 * i);
+      }
+    }
+#endif

Review Comment:
   IIUC, here you want to load these bytes in little-endian to be further 
processed by `CountTrailingZeros`. What you are doing is not leveraging 
`SafeLoadUpTo8Bytes()`, which is supposed to load bytes in big-endian (and 
currently not implemented), but write your own little-endian loading.
   
   This should work. But I think we'd better do it the other way:
   1. Implement the big-endian loading in `SafeLoadUpTo8Bytes()` (you already 
did it in your previous commit), keep the call to it here, for both little- and 
big-endian.
   2. For big-endian, issue an explicit byte swapping for big-endian: ```#if 
!ARROW_LITTLE_ENDIAN word = bit_util::ByteSwap(word); #endif```
   
   This way, the code can be more compact and semantic clear. The cost is an 
extra byte-swapping, which is trivial imho. cc @kou 



##########
cpp/src/arrow/compute/util.cc:
##########
@@ -30,34 +30,40 @@ namespace util {
 namespace bit_util {
 
 inline uint64_t SafeLoadUpTo8Bytes(const uint8_t* bytes, int num_bytes) {
-  // This will not be correct on big-endian architectures.
-#if !ARROW_LITTLE_ENDIAN
-  ARROW_DCHECK(false);
-#endif
   ARROW_DCHECK(num_bytes >= 0 && num_bytes <= 8);
   if (num_bytes == 8) {
     return util::SafeLoad(reinterpret_cast<const uint64_t*>(bytes));

Review Comment:
   So we are not going to update this function for big-endian because it won't 
be called? If so, why don't we keep the above `DCHECK(false)`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-48177: [C++][Parquet] Fix arrow-acero-asof-join-node-test failures on s390x [arrow]

Reply via email to