github-actions[bot] commented on code in PR #60941:
URL: https://github.com/apache/doris/pull/60941#discussion_r2928756901


##########
be/src/exec/common/hash_table/hash_map_context.h:
##########
@@ -112,27 +112,22 @@ struct MethodBaseInner {
     }
 
     template <bool read>
-    ALWAYS_INLINE void prefetch(size_t i) {
+    void prefetch(size_t i) {
         if (LIKELY(i + HASH_MAP_PREFETCH_DIST < hash_values.size())) {

Review Comment:
   **[Medium]** `ALWAYS_INLINE` was removed from `prefetch()`, `find()`, and 
`lazy_emplace()` in `MethodBaseInner`. These are hot-path per-row methods. 
While the new batch path for string hash maps bypasses them, the 
**non-string-hash-map path** still calls these per-row (see the `else` branches 
in `lazy_emplace_batch` / `find_batch` / `lazy_emplace_batch_void`). Removing 
`ALWAYS_INLINE` could cause the compiler to not inline these small functions, 
potentially regressing performance for non-string hash maps (e.g., numeric key 
aggregation).
   
   Was this removal intentional? If so, do you have benchmark data showing no 
regression for non-string hash map cases? Consider keeping `ALWAYS_INLINE` on 
these methods.



##########
be/src/exec/common/hash_table/string_hash_table.h:
##########
@@ -673,6 +673,28 @@ class StringHashTable : private boost::noncopyable {
     const_iterator cend() const { return end(); }
     iterator end() { return iterator(this, true); }
 
+    /// Public accessors for sub-tables, enabling direct batch operations
+    /// that bypass dispatch() for better performance (no per-row branching).
+    T0& get_submap_m0() { return m0; }
+    T1& get_submap_m1() { return m1; }

Review Comment:
   **[Nit]** These 6 individual `get_submap_*()` accessors appear to be dead 
code — nowhere in this PR (or the existing codebase) calls them. Only 
`visit_submaps()` is used by the batch processing code. Consider removing them 
to keep the interface minimal, or add a comment explaining their intended 
future use.



##########
be/src/exec/common/hash_table/hash_map_context.h:
##########
@@ -292,6 +287,56 @@ struct MethodSerialized : public MethodBase<TData> {
     }
 };
 
+/// Sub-table group indices for StringHashTable batch operations.
+/// StringHashTable dispatches keys to 6 sub-tables by string length:
+///   group 0: empty strings (size == 0) → m0
+///   group 1: size <= 2 → m1
+///   group 2: size <= 4 → m2
+///   group 3: size <= 8 → m3
+///   group 4: size <= 16 → m4
+///   group 5: size > 16 or trailing zero → ms
+/// By pre-grouping row indices, we can process each sub-table in a batch,
+/// achieving better cache locality and enabling prefetch within each group.
+struct StringKeySubTableGroups {
+    static constexpr int NUM_GROUPS = 6;
+    // Row indices for each sub-table group
+    DorisVector<uint32_t> group_row_indices[NUM_GROUPS];
+
+    void build(const StringRef* keys, uint32_t num_rows) {
+        for (int g = 0; g < NUM_GROUPS; g++) {
+            group_row_indices[g].clear();
+        }
+        // First pass: count sizes for each group to reserve memory
+        uint32_t counts[NUM_GROUPS] = {};
+        for (uint32_t i = 0; i < num_rows; i++) {
+            counts[get_group(keys[i])]++;
+        }
+        for (int g = 0; g < NUM_GROUPS; g++) {
+            group_row_indices[g].reserve(counts[g]);
+        }
+        // Second pass: fill group indices
+        for (uint32_t i = 0; i < num_rows; i++) {
+            group_row_indices[get_group(keys[i])].push_back(i);
+        }
+    }
+
+    static ALWAYS_INLINE int get_group(const StringRef& key) {
+        const size_t sz = key.size;
+        if (sz == 0) {
+            return 0;
+        }
+        if (key.data[sz - 1] == 0) {
+            // Trailing zero: goes to the generic long-string table (ms)
+            return 5;
+        }
+        if (sz <= 2) return 1;
+        if (sz <= 4) return 2;

Review Comment:
   **[Nit / Robustness]** These thresholds (2, 4, 8, 16) are hardcoded magic 
numbers that must stay in sync with `StringHashTable::dispatch()`, which uses 
`sizeof(StringKey2)`, `sizeof(StringKey4)`, `sizeof(StringKey8)`, 
`sizeof(StringKey16)`. Today they match, but if the key types ever change, this 
grouping would silently diverge from `dispatch()` routing.
   
   Consider using `sizeof(StringKey2)`, `sizeof(StringKey4)`, etc. (or a shared 
constant) instead of magic numbers, to make the coupling explicit and 
self-maintaining:
   ```cpp
   if (sz <= sizeof(StringKey2)) return 1;
   if (sz <= sizeof(StringKey4)) return 2;
   if (sz <= sizeof(StringKey8)) return 3;
   if (sz <= sizeof(StringKey16)) return 4;
   ```



##########
be/src/exec/common/hash_table/hash_map_context.h:
##########
@@ -365,8 +417,305 @@ struct MethodStringNoCache : public MethodBase<TData> {
         key_columns[0]->reserve(num_rows);
         key_columns[0]->insert_many_strings(input_keys.data(), num_rows);
     }
+
+    const StringKeySubTableGroups& get_sub_table_groups() const { return 
_sub_table_groups; }
+};
+
+/// Helper: detect whether HashMap is a nullable-wrapped StringHashMap.
+template <typename HashMap>
+struct IsNullableStringHashMap : std::false_type {};
+
+template <typename Mapped, typename Allocator>
+struct IsNullableStringHashMap<DataWithNullKey<StringHashMap<Mapped, 
Allocator>>> : std::true_type {
 };
 
+/// Helper: get the underlying StringHashTable from a hash table (handles 
DataWithNullKey wrapper).
+template <typename HashMap>
+auto& get_string_hash_table(HashMap& data) {
+    return data;
+}
+
+/// Compile-time key conversion for each sub-table group.
+/// Groups 1-4 use to_string_key<T>(); groups 0 and 5 use StringRef directly.
+/// Returns the converted key for the given group.
+/// For groups 0 and 5, the key is returned as a non-const copy 
(lazy_emplace_if_zero takes Key&).
+template <int GroupIdx>
+auto convert_key_for_submap(const StringRef& origin) {
+    if constexpr (GroupIdx == 0) {
+        return StringRef(origin); // copy — m0 needs non-const Key&
+    } else if constexpr (GroupIdx == 1) {
+        return to_string_key<StringKey2>(origin);
+    } else if constexpr (GroupIdx == 2) {
+        return to_string_key<StringKey4>(origin);
+    } else if constexpr (GroupIdx == 3) {
+        return to_string_key<StringKey8>(origin);
+    } else if constexpr (GroupIdx == 4) {
+        return to_string_key<StringKey16>(origin);
+    } else {
+        return StringRef(origin); // copy — ms uses StringRef as Key
+    }
+}
+
+/// Hash value to use for a given group. Group 0 (empty string) always uses 
hash=0.
+template <int GroupIdx, typename HashValues>
+size_t hash_for_group(const HashValues& hash_values, uint32_t row) {
+    if constexpr (GroupIdx == 0) {
+        return 0;
+    } else {
+        return hash_values[row];
+    }
+}
+
+/// Whether prefetch is useful for a group. Group 0 (StringHashTableEmpty, at 
most 1 element)
+/// does not benefit from prefetch.
+template <int GroupIdx>
+static constexpr bool group_needs_prefetch = (GroupIdx != 0);
+
+/// Process one sub-table group for emplace with result_handler.
+/// Handles nullable null-key check, prefetch, key conversion, and emplace.
+/// pre_handler(row) is called before each emplace, allowing callers to set 
per-row state
+/// (e.g., current row index used inside creator lambdas).
+template <int GroupIdx, bool is_nullable, typename Submap, typename 
HashMethodType, typename State,
+          typename HashMap, typename F, typename FF, typename PreHandler, 
typename ResultHandler>
+void process_submap_emplace(Submap& submap, const uint32_t* indices, size_t 
count,
+                            HashMethodType& agg_method, State& state, HashMap& 
hash_table,
+                            F&& creator, FF&& creator_for_null_key, 
PreHandler&& pre_handler,
+                            ResultHandler&& result_handler) {
+    using Mapped = typename HashMethodType::Mapped;
+    for (size_t j = 0; j < count; j++) {
+        if constexpr (group_needs_prefetch<GroupIdx>) {
+            if (j + HASH_MAP_PREFETCH_DIST < count) {
+                submap.template prefetch<false>(
+                        agg_method.hash_values[indices[j + 
HASH_MAP_PREFETCH_DIST]]);
+            }
+        }
+        uint32_t row = indices[j];
+        pre_handler(row);
+        if constexpr (is_nullable) {
+            if (state.key_column->is_null_at(row)) {
+                bool has_null_key = hash_table.has_null_key_data();
+                hash_table.has_null_key_data() = true;
+                if (!has_null_key) {
+                    std::forward<FF>(creator_for_null_key)(
+                            hash_table.template get_null_key_data<Mapped>());

Review Comment:
   **[Nit / Code hygiene]** `std::forward<FF>(creator_for_null_key)` is used 
inside a loop here (and similarly `std::forward<F>(creator)` at positions 
below). While currently safe because the lambdas passed from `visit_submaps` 
are lvalue references (captured by `[&]`), using `std::forward` inside a loop 
is a known anti-pattern: if someone later passes an rvalue, the first iteration 
would move-from the object, leaving it in a moved-from state for subsequent 
iterations.
   
   Since these are always called with lvalue references in practice, you could 
simply drop the `std::forward` wrappers and pass by plain reference, or add a 
comment explaining why this is safe in this context. This applies to all 
`std::forward<F>(creator)` and `std::forward<FF>(creator_for_null_key)` calls 
inside loops in `process_submap_emplace`, `process_submap_emplace_void`, and 
their callers.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to