pitrou commented on code in PR #45108:
URL: https://github.com/apache/arrow/pull/45108#discussion_r1904313094
##########
cpp/src/arrow/compute/key_map_internal.h:
##########
@@ -214,6 +214,14 @@ class ARROW_EXPORT SwissTable {
int log_minibatch_;
// Base 2 log of the number of blocks
int log_blocks_ = 0;
+ // When log_blocks_ is greater than 25, there will be overlapping bits
between block id
+ // and stamp within a 32-bit hash value. So we must check if this is the
case when
+ // right-shifting a hash value to retrieve block id and stamp. The following
two
+ // variables will be derived from log_blocks_ as log_blocks changes, and use
them as the
+ // number of bits to right shift, rather than branching on whether
log_blocks_ > 25
+ // every time in tight loops.
+ int bits_shift_for_block_and_stamp_ = bits_hash_ - log_blocks_ - bits_stamp_;
+ int bits_shift_for_block_ = bits_stamp_;
Review Comment:
Since the computation is repeated several times, perhaps we can have a short
helper function to factor it out? Something like:
```c++
static std::pair<int, int> ComputeBitShifts(int log_blocks) {
if (log_blocks + bits_stamp_ > bits_hash_) {
return {0, bits_hash_ - log_blocks};
} else {
return {bits_hash_ - log_blocks - bits_stamp_, bits_stamp_};
}
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]