Re: [PR] GH-44513: [C++] Fix overflow issues for large build side in swiss join [arrow]

via GitHub Mon, 06 Jan 2025 08:02:48 -0800


pitrou commented on code in PR #45108:
URL: https://github.com/apache/arrow/pull/45108#discussion_r1904334656



##########
cpp/src/arrow/compute/key_map_internal_avx2.cc:
##########
@@ -392,16 +392,32 @@ int SwissTable::extract_group_ids_avx2(const int 
num_keys, const uint32_t* hashe
   } else {
     for (int i = 0; i < num_keys / unroll; ++i) {
       __m256i hash = _mm256_loadu_si256(reinterpret_cast<const 
__m256i*>(hashes) + i);
+      // Extend hash and local_slot to 64-bit to compute 64-bit group id 
offsets to
+      // gather.
+      // This is to prevent index overflow issues in GH-44513.
+      // NB: Use zero-extend conversion for unsigned hash.
+      __m256i hash_lo = _mm256_cvtepu32_epi64(_mm256_castsi256_si128(hash));
+      __m256i hash_hi = _mm256_cvtepu32_epi64(_mm256_extracti128_si256(hash, 
1));
       __m256i local_slot =
           _mm256_set1_epi64x(reinterpret_cast<const 
uint64_t*>(local_slots)[i]);
       local_slot = _mm256_shuffle_epi8(

Review Comment:
   Hmm... so this first expands `_mm256_shuffle_epi8` from 8-bit to 32-bit 
lanes, and then `_mm256_cvtepi32_epi64` below expands it from 32-bit to 64-bit 
lanes? Would it be quicker to shuffle directly from 8-bit to 64-bit (twice, I 
suppose)
   
   (interestingly, `_mm256_shuffle_epi8` is faster than `_mm256_cvtepi32_epi64` 
according to 
https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_shuffle_epi8&ig_expand=1798,6006,1628,6006)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-44513: [C++] Fix overflow issues for large build side in swiss join [arrow]

Reply via email to