github-actions[bot] commented on code in PR #60941:
URL: https://github.com/apache/doris/pull/60941#discussion_r2928756901
##########
be/src/exec/common/hash_table/hash_map_context.h:
##########
@@ -112,27 +112,22 @@ struct MethodBaseInner {
}
template <bool read>
- ALWAYS_INLINE void prefetch(size_t i) {
+ void prefetch(size_t i) {
if (LIKELY(i + HASH_MAP_PREFETCH_DIST < hash_values.size())) {
Review Comment:
**[Medium]** `ALWAYS_INLINE` was removed from `prefetch()`, `find()`, and
`lazy_emplace()` in `MethodBaseInner`. These are hot-path per-row methods.
While the new batch path for string hash maps bypasses them, the
**non-string-hash-map path** still calls these per-row (see the `else` branches
in `lazy_emplace_batch` / `find_batch` / `lazy_emplace_batch_void`). Removing
`ALWAYS_INLINE` could cause the compiler to not inline these small functions,
potentially regressing performance for non-string hash maps (e.g., numeric key
aggregation).
Was this removal intentional? If so, do you have benchmark data showing no
regression for non-string hash map cases? Consider keeping `ALWAYS_INLINE` on
these methods.
##########
be/src/exec/common/hash_table/string_hash_table.h:
##########
@@ -673,6 +673,28 @@ class StringHashTable : private boost::noncopyable {
const_iterator cend() const { return end(); }
iterator end() { return iterator(this, true); }
+ /// Public accessors for sub-tables, enabling direct batch operations
+ /// that bypass dispatch() for better performance (no per-row branching).
+ T0& get_submap_m0() { return m0; }
+ T1& get_submap_m1() { return m1; }
Review Comment:
**[Nit]** These 6 individual `get_submap_*()` accessors appear to be dead
code — nowhere in this PR (or the existing codebase) calls them. Only
`visit_submaps()` is used by the batch processing code. Consider removing them
to keep the interface minimal, or add a comment explaining their intended
future use.
##########
be/src/exec/common/hash_table/hash_map_context.h:
##########
@@ -292,6 +287,56 @@ struct MethodSerialized : public MethodBase<TData> {
}
};
+/// Sub-table group indices for StringHashTable batch operations.
+/// StringHashTable dispatches keys to 6 sub-tables by string length:
+/// group 0: empty strings (size == 0) → m0
+/// group 1: size <= 2 → m1
+/// group 2: size <= 4 → m2
+/// group 3: size <= 8 → m3
+/// group 4: size <= 16 → m4
+/// group 5: size > 16 or trailing zero → ms
+/// By pre-grouping row indices, we can process each sub-table in a batch,
+/// achieving better cache locality and enabling prefetch within each group.
+struct StringKeySubTableGroups {
+ static constexpr int NUM_GROUPS = 6;
+ // Row indices for each sub-table group
+ DorisVector<uint32_t> group_row_indices[NUM_GROUPS];
+
+ void build(const StringRef* keys, uint32_t num_rows) {
+ for (int g = 0; g < NUM_GROUPS; g++) {
+ group_row_indices[g].clear();
+ }
+ // First pass: count sizes for each group to reserve memory
+ uint32_t counts[NUM_GROUPS] = {};
+ for (uint32_t i = 0; i < num_rows; i++) {
+ counts[get_group(keys[i])]++;
+ }
+ for (int g = 0; g < NUM_GROUPS; g++) {
+ group_row_indices[g].reserve(counts[g]);
+ }
+ // Second pass: fill group indices
+ for (uint32_t i = 0; i < num_rows; i++) {
+ group_row_indices[get_group(keys[i])].push_back(i);
+ }
+ }
+
+ static ALWAYS_INLINE int get_group(const StringRef& key) {
+ const size_t sz = key.size;
+ if (sz == 0) {
+ return 0;
+ }
+ if (key.data[sz - 1] == 0) {
+ // Trailing zero: goes to the generic long-string table (ms)
+ return 5;
+ }
+ if (sz <= 2) return 1;
+ if (sz <= 4) return 2;
Review Comment:
**[Nit / Robustness]** These thresholds (2, 4, 8, 16) are hardcoded magic
numbers that must stay in sync with `StringHashTable::dispatch()`, which uses
`sizeof(StringKey2)`, `sizeof(StringKey4)`, `sizeof(StringKey8)`,
`sizeof(StringKey16)`. Today they match, but if the key types ever change, this
grouping would silently diverge from `dispatch()` routing.
Consider using `sizeof(StringKey2)`, `sizeof(StringKey4)`, etc. (or a shared
constant) instead of magic numbers, to make the coupling explicit and
self-maintaining:
```cpp
if (sz <= sizeof(StringKey2)) return 1;
if (sz <= sizeof(StringKey4)) return 2;
if (sz <= sizeof(StringKey8)) return 3;
if (sz <= sizeof(StringKey16)) return 4;
```
##########
be/src/exec/common/hash_table/hash_map_context.h:
##########
@@ -365,8 +417,305 @@ struct MethodStringNoCache : public MethodBase<TData> {
key_columns[0]->reserve(num_rows);
key_columns[0]->insert_many_strings(input_keys.data(), num_rows);
}
+
+ const StringKeySubTableGroups& get_sub_table_groups() const { return
_sub_table_groups; }
+};
+
+/// Helper: detect whether HashMap is a nullable-wrapped StringHashMap.
+template <typename HashMap>
+struct IsNullableStringHashMap : std::false_type {};
+
+template <typename Mapped, typename Allocator>
+struct IsNullableStringHashMap<DataWithNullKey<StringHashMap<Mapped,
Allocator>>> : std::true_type {
};
+/// Helper: get the underlying StringHashTable from a hash table (handles
DataWithNullKey wrapper).
+template <typename HashMap>
+auto& get_string_hash_table(HashMap& data) {
+ return data;
+}
+
+/// Compile-time key conversion for each sub-table group.
+/// Groups 1-4 use to_string_key<T>(); groups 0 and 5 use StringRef directly.
+/// Returns the converted key for the given group.
+/// For groups 0 and 5, the key is returned as a non-const copy
(lazy_emplace_if_zero takes Key&).
+template <int GroupIdx>
+auto convert_key_for_submap(const StringRef& origin) {
+ if constexpr (GroupIdx == 0) {
+ return StringRef(origin); // copy — m0 needs non-const Key&
+ } else if constexpr (GroupIdx == 1) {
+ return to_string_key<StringKey2>(origin);
+ } else if constexpr (GroupIdx == 2) {
+ return to_string_key<StringKey4>(origin);
+ } else if constexpr (GroupIdx == 3) {
+ return to_string_key<StringKey8>(origin);
+ } else if constexpr (GroupIdx == 4) {
+ return to_string_key<StringKey16>(origin);
+ } else {
+ return StringRef(origin); // copy — ms uses StringRef as Key
+ }
+}
+
+/// Hash value to use for a given group. Group 0 (empty string) always uses
hash=0.
+template <int GroupIdx, typename HashValues>
+size_t hash_for_group(const HashValues& hash_values, uint32_t row) {
+ if constexpr (GroupIdx == 0) {
+ return 0;
+ } else {
+ return hash_values[row];
+ }
+}
+
+/// Whether prefetch is useful for a group. Group 0 (StringHashTableEmpty, at
most 1 element)
+/// does not benefit from prefetch.
+template <int GroupIdx>
+static constexpr bool group_needs_prefetch = (GroupIdx != 0);
+
+/// Process one sub-table group for emplace with result_handler.
+/// Handles nullable null-key check, prefetch, key conversion, and emplace.
+/// pre_handler(row) is called before each emplace, allowing callers to set
per-row state
+/// (e.g., current row index used inside creator lambdas).
+template <int GroupIdx, bool is_nullable, typename Submap, typename
HashMethodType, typename State,
+ typename HashMap, typename F, typename FF, typename PreHandler,
typename ResultHandler>
+void process_submap_emplace(Submap& submap, const uint32_t* indices, size_t
count,
+ HashMethodType& agg_method, State& state, HashMap&
hash_table,
+ F&& creator, FF&& creator_for_null_key,
PreHandler&& pre_handler,
+ ResultHandler&& result_handler) {
+ using Mapped = typename HashMethodType::Mapped;
+ for (size_t j = 0; j < count; j++) {
+ if constexpr (group_needs_prefetch<GroupIdx>) {
+ if (j + HASH_MAP_PREFETCH_DIST < count) {
+ submap.template prefetch<false>(
+ agg_method.hash_values[indices[j +
HASH_MAP_PREFETCH_DIST]]);
+ }
+ }
+ uint32_t row = indices[j];
+ pre_handler(row);
+ if constexpr (is_nullable) {
+ if (state.key_column->is_null_at(row)) {
+ bool has_null_key = hash_table.has_null_key_data();
+ hash_table.has_null_key_data() = true;
+ if (!has_null_key) {
+ std::forward<FF>(creator_for_null_key)(
+ hash_table.template get_null_key_data<Mapped>());
Review Comment:
**[Nit / Code hygiene]** `std::forward<FF>(creator_for_null_key)` is used
inside a loop here (and similarly `std::forward<F>(creator)` at positions
below). While currently safe because the lambdas passed from `visit_submaps`
are lvalue references (captured by `[&]`), using `std::forward` inside a loop
is a known anti-pattern: if someone later passes an rvalue, the first iteration
would move-from the object, leaving it in a moved-from state for subsequent
iterations.
Since these are always called with lvalue references in practice, you could
simply drop the `std::forward` wrappers and pass by plain reference, or add a
comment explaining why this is safe in this context. This applies to all
`std::forward<F>(creator)` and `std::forward<FF>(creator_for_null_key)` calls
inside loops in `process_submap_emplace`, `process_submap_emplace_void`, and
their callers.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]