This is an automated email from the ASF dual-hosted git repository.
zanmato pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/main by this push:
new e7471f85b4 MINOR: [C++] Fix typos in key_map doc and comment (#45104)
e7471f85b4 is described below
commit e7471f85b45a0f547bf5395bd98f8d6d8aa0af78
Author: Rossi Sun <[email protected]>
AuthorDate: Tue Dec 24 14:21:54 2024 +0800
MINOR: [C++] Fix typos in key_map doc and comment (#45104)
### Rationale for this change
### What changes are included in this PR?
### Are these changes tested?
### Are there any user-facing changes?
Authored-by: Rossi Sun <[email protected]>
Signed-off-by: Rossi Sun <[email protected]>
---
cpp/src/arrow/acero/doc/key_map.md | 2 +-
cpp/src/arrow/compute/key_map_internal.cc | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/cpp/src/arrow/acero/doc/key_map.md
b/cpp/src/arrow/acero/doc/key_map.md
index fdedc88c4d..a676343bbe 100644
--- a/cpp/src/arrow/acero/doc/key_map.md
+++ b/cpp/src/arrow/acero/doc/key_map.md
@@ -65,7 +65,7 @@ Every slot can either be **empty** or store data related to a
single inserted ke
Status byte, as the name suggests, stores 8 bits. The highest bit indicates if
the slot is empty (the highest bit is set) or corresponds to one of inserted
keys (the highest bit is zero). The remaining 7 bits contain 7 bits of key hash
that we call a **stamp**. The stamp is used to eliminate some false positives
when searching for a matching key for a given input. Slot also stores **key
id**, which is a non-negative integer smaller than the number of inserted keys,
that is used as a refe [...]
-A single block contains 8 slots and can be viewed as a micro-stack of up to 8
inserted keys. When the first key is inserted into an empty block, it will
occupy a slot with local id 0. The second inserted key will go into slot number
1 and so on. We use N highest bits of hash to get an index of a **start
block**, when searching for a match or an empty slot to insert a previously not
seen key when that is the case. If the start block contains any empty slots,
then the search for either a m [...]
+A single block contains 8 slots and can be viewed as a micro-stack of up to 8
inserted keys. When the first key is inserted into an empty block, it will
occupy a slot with local id 0. The second inserted key will go into slot number
1 and so on. We use N highest bits of hash to get an index of a **start
block**, when searching for a match or an empty slot to insert a previously not
seen key when that is the case. If the start block contains any empty slots,
then the search for either a m [...]
The most interesting part of each block is the set of status bytes of its
slots, which is simply a single 64-bit word. The implementation of efficient
searches across these bytes during lookups require using either leading zero
count or trailing zero count intrinsic. Since there are cases when only the
first one is available, in order to take advantage of it, we order the bytes in
the 64-bit status word so that the first slot within a block uses the highest
byte and the last one uses the [...]
diff --git a/cpp/src/arrow/compute/key_map_internal.cc
b/cpp/src/arrow/compute/key_map_internal.cc
index 9e6d60ab50..81f1543cef 100644
--- a/cpp/src/arrow/compute/key_map_internal.cc
+++ b/cpp/src/arrow/compute/key_map_internal.cc
@@ -281,7 +281,7 @@ void SwissTable::early_filter_imp(const int num_keys, const
uint32_t* hashes,
// When we reach this limit, we need to break processing of any further rows
and resize.
//
uint64_t SwissTable::num_groups_for_resize() const {
- // Resize small hash tables when 50% full (up to 12KB).
+ // Resize small hash tables when 50% full (up to 32KB).
// Resize large hash tables when 75% full.
constexpr int log_blocks_small_ = 9;
uint64_t num_slots = 1ULL << (log_blocks_ + 3);