(doris) branch master updated: [opt](function) Eliminate redundant hash computation in AggregateFunctionUniq (#61730)

panxiaolei Mon, 30 Mar 2026 19:05:45 -0700

This is an automated email from the ASF dual-hosted git repository.

panxiaolei pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git



The following commit(s) were added to refs/heads/master by this push:
     new f5feae9bc0e [opt](function) Eliminate redundant hash computation in 
AggregateFunctionUniq (#61730)
f5feae9bc0e is described below

commit f5feae9bc0ebfda1ac19d4069d80d450765d2101
Author: Mryange <[email protected]>
AuthorDate: Tue Mar 31 10:05:30 2026 +0800

    [opt](function) Eliminate redundant hash computation in 
AggregateFunctionUniq (#61730)
    
    Problem Summary:
    
    In `AggregateFunctionUniq::add_batch_single_place` and `add_batch` (the
    hot path
    for `SELECT count(distinct col)`), every key's hash is computed
    **twice**:
    
    1. `set.prefetch(keys[i + HASH_MAP_PREFETCH_DIST])` internally calls
       `this->hash(key)` to locate the slot for CPU prefetch.
    2. `set.insert(keys[i])` goes through `emplace` → `EmplaceDecomposable`
    →
       `s.hash(key)` again to find or prepare the insert position.
    
    The same codebase already has a correct "precompute hash + reuse"
    pattern in the
    `DistinctStreamingAgg` operator path (`hash_map_context.h`), where
    `init_hash_values()` computes hash once, and subsequent `prefetch` /
    `lazy_emplace_key` calls reuse the precomputed value. This PR applies
    the same
    optimization to `AggregateFunctionUniq`.
    
    ### How does this PR solve it?
    
    For both `add_batch` and `add_batch_single_place` in
    `aggregate_function_uniq.h`
    and `aggregate_function_uniq_distribute_key.h`:
    
    1. **Precompute hash values** into a `std::vector<size_t>` before the
    main loop,
    using `set.hash(keys[i])` — this is the only hash computation per key.
    2. **Replace `set.prefetch(key)`** with
    `set.prefetch_hash(hash_values[...])` —
    reuses the precomputed hash, avoids recalculation and the unnecessary
    memory
       access to `keys[i + HASH_MAP_PREFETCH_DIST]` at prefetch time.
    3. **Replace `set.insert(key)`** with
    `set.emplace_with_hash(hash_values[i], key)`
    — passes the precomputed hash directly, skipping the internal hash call.
    
    Both `prefetch_hash` and `emplace_with_hash` are existing APIs in phmap
    (`parallel_hashmap/phmap.h`), no third-party changes needed.
    
    **Expected improvements:**
    - Hash computation reduced from 2× to 1× per key
    - The precompute loop is a pure sequential scan over `keys[]`, which is
    more
    cache-friendly than interleaving hash computation with hash-table
    probing
    - Better prefetch effectiveness: prefetch no longer needs to access
      `keys[i + HASH_MAP_PREFETCH_DIST]` memory just to compute its hash
---
 be/src/exprs/aggregate/aggregate_function_uniq.h   | 22 +++++++++++++++++-----
 .../aggregate_function_uniq_distribute_key.h       | 22 +++++++++++++++++-----
 2 files changed, 34 insertions(+), 10 deletions(-)

diff --git a/be/src/exprs/aggregate/aggregate_function_uniq.h 
b/be/src/exprs/aggregate/aggregate_function_uniq.h
index 4f10bc9f2ce..7de1732c1ca 100644
--- a/be/src/exprs/aggregate/aggregate_function_uniq.h
+++ b/be/src/exprs/aggregate/aggregate_function_uniq.h
@@ -164,13 +164,19 @@ public:
             array_of_data_set[i] = &(this->data(places[i] + place_offset).set);
         }
 
+        // Precompute hash values to avoid double computation in prefetch + 
insert
+        std::vector<size_t> hash_values(batch_size);
+        for (size_t i = 0; i < batch_size; ++i) {
+            hash_values[i] = array_of_data_set[i]->hash(keys[i]);
+        }
+
         for (size_t i = 0; i != batch_size; ++i) {
             if (i + HASH_MAP_PREFETCH_DIST < batch_size) {
-                array_of_data_set[i + HASH_MAP_PREFETCH_DIST]->prefetch(
-                        keys[i + HASH_MAP_PREFETCH_DIST]);
+                array_of_data_set[i + HASH_MAP_PREFETCH_DIST]->prefetch_hash(
+                        hash_values[i + HASH_MAP_PREFETCH_DIST]);
             }
 
-            array_of_data_set[i]->insert(keys[i]);
+            array_of_data_set[i]->emplace_with_hash(hash_values[i], keys[i]);
         }
     }
 
@@ -193,11 +199,17 @@ public:
         const KeyType* keys = get_keys(keys_container, *columns[0], 
batch_size);
         auto& set = this->data(place).set;
 
+        // Precompute hash values to avoid double computation in prefetch + 
insert
+        std::vector<size_t> hash_values(batch_size);
+        for (size_t i = 0; i < batch_size; ++i) {
+            hash_values[i] = set.hash(keys[i]);
+        }
+
         for (size_t i = 0; i != batch_size; ++i) {
             if (i + HASH_MAP_PREFETCH_DIST < batch_size) {
-                set.prefetch(keys[i + HASH_MAP_PREFETCH_DIST]);
+                set.prefetch_hash(hash_values[i + HASH_MAP_PREFETCH_DIST]);
             }
-            set.insert(keys[i]);
+            set.emplace_with_hash(hash_values[i], keys[i]);
         }
     }
 
diff --git a/be/src/exprs/aggregate/aggregate_function_uniq_distribute_key.h 
b/be/src/exprs/aggregate/aggregate_function_uniq_distribute_key.h
index 8d44d74d03e..35c2f5f7f2a 100644
--- a/be/src/exprs/aggregate/aggregate_function_uniq_distribute_key.h
+++ b/be/src/exprs/aggregate/aggregate_function_uniq_distribute_key.h
@@ -122,13 +122,19 @@ public:
             array_of_data_set[i] = &(this->data(places[i] + place_offset).set);
         }
 
+        // Precompute hash values to avoid double computation in prefetch + 
insert
+        std::vector<size_t> hash_values(batch_size);
+        for (size_t i = 0; i < batch_size; ++i) {
+            hash_values[i] = array_of_data_set[i]->hash(keys[i]);
+        }
+
         for (size_t i = 0; i != batch_size; ++i) {
             if (i + HASH_MAP_PREFETCH_DIST < batch_size) {
-                array_of_data_set[i + HASH_MAP_PREFETCH_DIST]->prefetch(
-                        keys[i + HASH_MAP_PREFETCH_DIST]);
+                array_of_data_set[i + HASH_MAP_PREFETCH_DIST]->prefetch_hash(
+                        hash_values[i + HASH_MAP_PREFETCH_DIST]);
             }
 
-            array_of_data_set[i]->insert(keys[i]);
+            array_of_data_set[i]->emplace_with_hash(hash_values[i], keys[i]);
         }
     }
 
@@ -138,11 +144,17 @@ public:
         const KeyType* keys = get_keys(keys_container, *columns[0], 
batch_size);
         auto& set = this->data(place).set;
 
+        // Precompute hash values to avoid double computation in prefetch + 
insert
+        std::vector<size_t> hash_values(batch_size);
+        for (size_t i = 0; i < batch_size; ++i) {
+            hash_values[i] = set.hash(keys[i]);
+        }
+
         for (size_t i = 0; i != batch_size; ++i) {
             if (i + HASH_MAP_PREFETCH_DIST < batch_size) {
-                set.prefetch(keys[i + HASH_MAP_PREFETCH_DIST]);
+                set.prefetch_hash(hash_values[i + HASH_MAP_PREFETCH_DIST]);
             }
-            set.insert(keys[i]);
+            set.emplace_with_hash(hash_values[i], keys[i]);
         }
     }
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(doris) branch master updated: [opt](function) Eliminate redundant hash computation in AggregateFunctionUniq (#61730)

Reply via email to