Re: [PR] opt for StringHashTable's for_each [doris]

via GitHub Tue, 03 Mar 2026 02:17:39 -0800


github-actions[bot] commented on code in PR #60941:
URL: https://github.com/apache/doris/pull/60941#discussion_r2877398943



##########
be/src/vec/common/hash_table/hash_map_context.h:
##########
@@ -292,6 +287,56 @@ struct MethodSerialized : public MethodBase<TData> {
     }
 };
 
+/// Sub-table group indices for StringHashTable batch operations.
+/// StringHashTable dispatches keys to 6 sub-tables by string length:
+///   group 0: empty strings (size == 0) → m0
+///   group 1: size <= 2 → m1
+///   group 2: size <= 4 → m2
+///   group 3: size <= 8 → m3
+///   group 4: size <= 16 → m4
+///   group 5: size > 16 or trailing zero → ms
+/// By pre-grouping row indices, we can process each sub-table in a batch,
+/// achieving better cache locality and enabling prefetch within each group.
+struct StringKeySubTableGroups {
+    static constexpr int NUM_GROUPS = 6;

Review Comment:
   **[Nit]** The `build()` method iterates through all keys twice (once for 
counting, once for filling). For very large batches this is fine, but note that 
a single-pass approach using `push_back` with the pre-reserved sizes from the 
count pass is what you already do. This is clean.
   
   One minor observation: null rows (when nullable) are still assigned to a 
sub-table group based on their underlying nested column data (which may be 
arbitrary for null rows). This is correct because the `process_submap_*` 
functions check `is_null_at(row)` before touching the sub-table, but it does 
mean null rows are scattered across groups rather than being handled in a 
separate pass. This is fine for correctness but could be a tiny missed 
optimization opportunity — grouping nulls separately could skip them faster.



##########
be/src/vec/common/hash_table/hash_map_context.h:
##########
@@ -365,8 +417,305 @@ struct MethodStringNoCache : public MethodBase<TData> {
         key_columns[0]->reserve(num_rows);
         key_columns[0]->insert_many_strings(input_keys.data(), num_rows);
     }
+
+    const StringKeySubTableGroups& get_sub_table_groups() const { return 
_sub_table_groups; }
+};
+
+/// Helper: detect whether HashMap is a nullable-wrapped StringHashMap.
+template <typename HashMap>
+struct IsNullableStringHashMap : std::false_type {};
+
+template <typename Mapped, typename Allocator>
+struct IsNullableStringHashMap<DataWithNullKey<StringHashMap<Mapped, 
Allocator>>> : std::true_type {
 };
 
+/// Helper: get the underlying StringHashTable from a hash table (handles 
DataWithNullKey wrapper).
+template <typename HashMap>
+auto& get_string_hash_table(HashMap& data) {
+    return data;
+}
+
+/// Compile-time key conversion for each sub-table group.
+/// Groups 1-4 use to_string_key<T>(); groups 0 and 5 use StringRef directly.
+/// Returns the converted key for the given group.
+/// For groups 0 and 5, the key is returned as a non-const copy 
(lazy_emplace_if_zero takes Key&).
+template <int GroupIdx>
+auto convert_key_for_submap(const StringRef& origin) {
+    if constexpr (GroupIdx == 0) {
+        return StringRef(origin); // copy — m0 needs non-const Key&
+    } else if constexpr (GroupIdx == 1) {
+        return to_string_key<StringKey2>(origin);
+    } else if constexpr (GroupIdx == 2) {
+        return to_string_key<StringKey4>(origin);
+    } else if constexpr (GroupIdx == 3) {
+        return to_string_key<StringKey8>(origin);
+    } else if constexpr (GroupIdx == 4) {
+        return to_string_key<StringKey16>(origin);
+    } else {
+        return StringRef(origin); // copy — ms uses StringRef as Key
+    }
+}
+
+/// Hash value to use for a given group. Group 0 (empty string) always uses 
hash=0.
+template <int GroupIdx, typename HashValues>
+size_t hash_for_group(const HashValues& hash_values, uint32_t row) {
+    if constexpr (GroupIdx == 0) {
+        return 0;
+    } else {
+        return hash_values[row];
+    }
+}
+
+/// Whether prefetch is useful for a group. Group 0 (StringHashTableEmpty, at 
most 1 element)
+/// does not benefit from prefetch.
+template <int GroupIdx>
+static constexpr bool group_needs_prefetch = (GroupIdx != 0);
+
+/// Process one sub-table group for emplace with result_handler.
+/// Handles nullable null-key check, prefetch, key conversion, and emplace.
+/// pre_handler(row) is called before each emplace, allowing callers to set 
per-row state
+/// (e.g., current row index used inside creator lambdas).
+template <int GroupIdx, bool is_nullable, typename Submap, typename 
HashMethodType, typename State,
+          typename HashMap, typename F, typename FF, typename PreHandler, 
typename ResultHandler>
+void process_submap_emplace(Submap& submap, const uint32_t* indices, size_t 
count,
+                            HashMethodType& agg_method, State& state, HashMap& 
hash_table,
+                            F&& creator, FF&& creator_for_null_key, 
PreHandler&& pre_handler,
+                            ResultHandler&& result_handler) {
+    using Mapped = typename HashMethodType::Mapped;
+    for (size_t j = 0; j < count; j++) {
+        if constexpr (group_needs_prefetch<GroupIdx>) {
+            if (j + HASH_MAP_PREFETCH_DIST < count) {
+                submap.template prefetch<false>(
+                        agg_method.hash_values[indices[j + 
HASH_MAP_PREFETCH_DIST]]);
+            }
+        }
+        uint32_t row = indices[j];
+        pre_handler(row);
+        if constexpr (is_nullable) {
+            if (state.key_column->is_null_at(row)) {
+                bool has_null_key = hash_table.has_null_key_data();
+                hash_table.has_null_key_data() = true;
+                if (!has_null_key) {
+                    std::forward<FF>(creator_for_null_key)(
+                            hash_table.template get_null_key_data<Mapped>());
+                }
+                result_handler(row, hash_table.template 
get_null_key_data<Mapped>());
+                continue;
+            }
+        }
+        const auto& origin = agg_method.keys[row];
+        auto converted_key = convert_key_for_submap<GroupIdx>(origin);
+        typename Submap::LookupResult result;
+        if constexpr (GroupIdx == 0 || GroupIdx == 5) {
+            // Groups 0,5: key and origin are the same StringRef
+            submap.lazy_emplace_with_origin(converted_key, converted_key, 
result,
+                                            
hash_for_group<GroupIdx>(agg_method.hash_values, row),
+                                            std::forward<F>(creator));
+        } else {

Review Comment:
   **[Medium]** `std::forward<F>(creator)` is called inside a loop (for each 
row in the sub-table group), and this function is also called from 
`visit_submaps` which invokes it up to 6 times (once per sub-table group). 
While this is currently safe because all callers pass lvalue references through 
the `[&]` capture in `visit_submaps`, the function signature (`F&& creator`) 
combined with `std::forward` in a loop is a well-known footgun pattern.
   
   If anyone ever calls `process_submap_emplace` directly with an rvalue 
callable, the first `std::forward` would move from it, and all subsequent 
iterations would use a moved-from object.
   
   Consider either:
   - Removing `std::forward` and just passing `creator` directly (since it's 
always an lvalue in practice)
   - Or accepting `const F& creator` to make the contract explicit
   
   Same issue applies to `process_submap_emplace_void` below.



##########
be/src/vec/common/hash_table/string_hash_table.h:
##########
@@ -673,6 +673,28 @@ class StringHashTable : private boost::noncopyable {
     const_iterator cend() const { return end(); }
     iterator end() { return iterator(this, true); }
 
+    /// Public accessors for sub-tables, enabling direct batch operations
+    /// that bypass dispatch() for better performance (no per-row branching).
+    T0& get_submap_m0() { return m0; }
+    T1& get_submap_m1() { return m1; }

Review Comment:
   **[Low]** These 6 individual `get_submap_m0()` through `get_submap_ms()` 
accessors appear to be unused — all batch operations go through 
`visit_submaps()` instead. If they are not needed, consider removing them to 
reduce the public API surface. If they are intended for future use, a comment 
explaining the intent would be helpful.



##########
be/src/vec/common/hash_table/hash_map_context.h:
##########
@@ -357,6 +405,10 @@ struct MethodStringNoCache : public MethodBase<TData> {
             Base::init_join_bucket_num(num_rows, bucket_size, null_map);
         } else {
             Base::init_hash_values(num_rows, null_map);
+            // Build sub-table groups for batch emplace/find (only for 
aggregation, not join)
+            if constexpr (Base::is_string_hash_map()) {
+                _sub_table_groups.build(Base::keys, num_rows);
+            }

Review Comment:
   **[Note]** `_sub_table_groups.build()` is called during 
`init_serialized_keys()` for all aggregation paths (not just those using the 
batch API). This means even if a code path doesn't use 
`lazy_emplace_batch`/`find_batch`, it still pays the cost of building sub-table 
groups. The cost is O(2*num_rows) which is small relative to the hash 
operations, but worth noting.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] opt for StringHashTable's for_each [doris]

Reply via email to