This is an automated email from the ASF dual-hosted git repository.

zanmato pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/main by this push:
     new e7471f85b4 MINOR: [C++] Fix typos in key_map doc and comment (#45104)
e7471f85b4 is described below

commit e7471f85b45a0f547bf5395bd98f8d6d8aa0af78
Author: Rossi Sun <[email protected]>
AuthorDate: Tue Dec 24 14:21:54 2024 +0800

    MINOR: [C++] Fix typos in key_map doc and comment (#45104)
    
    
    
    ### Rationale for this change
    
    ### What changes are included in this PR?
    
    ### Are these changes tested?
    
    ### Are there any user-facing changes?
    
    Authored-by: Rossi Sun <[email protected]>
    Signed-off-by: Rossi Sun <[email protected]>
---
 cpp/src/arrow/acero/doc/key_map.md        | 2 +-
 cpp/src/arrow/compute/key_map_internal.cc | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/cpp/src/arrow/acero/doc/key_map.md 
b/cpp/src/arrow/acero/doc/key_map.md
index fdedc88c4d..a676343bbe 100644
--- a/cpp/src/arrow/acero/doc/key_map.md
+++ b/cpp/src/arrow/acero/doc/key_map.md
@@ -65,7 +65,7 @@ Every slot can either be **empty** or store data related to a 
single inserted ke
 
 Status byte, as the name suggests, stores 8 bits. The highest bit indicates if 
the slot is empty (the highest bit is set) or corresponds to one of inserted 
keys (the highest bit is zero). The remaining 7 bits contain 7 bits of key hash 
that we call a **stamp**. The stamp is used to eliminate some false positives 
when searching for a matching key for a given input. Slot also stores **key 
id**, which is a non-negative integer smaller than the number of inserted keys, 
that is used as a refe [...]
 
-A single block contains 8 slots and can be viewed as a micro-stack of up to 8 
inserted keys. When the first key is inserted into an empty block, it will 
occupy a slot with local id 0. The second inserted key will go into slot number 
1 and so on. We use N highest bits of hash to get an index of a **start 
block**, when searching for a match or an empty slot to insert a previously not 
seen key when that is the case. If the start block contains any empty slots, 
then the search for either a m [...]
+A single block contains 8 slots and can be viewed as a micro-stack of up to 8 
inserted keys. When the first key is inserted into an empty block, it will 
occupy a slot with local id 0. The second inserted key will go into slot number 
1 and so on. We use N highest bits of hash to get an index of a **start 
block**, when searching for a match or an empty slot to insert a previously not 
seen key when that is the case. If the start block contains any empty slots, 
then the search for either a m [...]
 
 The most interesting part of each block is the set of status bytes of its 
slots, which is simply a single 64-bit word. The implementation of efficient 
searches across these bytes during lookups require using either leading zero 
count or trailing zero count intrinsic. Since there are cases when only the 
first one is available, in order to take advantage of it, we order the bytes in 
the 64-bit status word so that the first slot within a block uses the highest 
byte and the last one uses the [...]
 
diff --git a/cpp/src/arrow/compute/key_map_internal.cc 
b/cpp/src/arrow/compute/key_map_internal.cc
index 9e6d60ab50..81f1543cef 100644
--- a/cpp/src/arrow/compute/key_map_internal.cc
+++ b/cpp/src/arrow/compute/key_map_internal.cc
@@ -281,7 +281,7 @@ void SwissTable::early_filter_imp(const int num_keys, const 
uint32_t* hashes,
 // When we reach this limit, we need to break processing of any further rows 
and resize.
 //
 uint64_t SwissTable::num_groups_for_resize() const {
-  // Resize small hash tables when 50% full (up to 12KB).
+  // Resize small hash tables when 50% full (up to 32KB).
   // Resize large hash tables when 75% full.
   constexpr int log_blocks_small_ = 9;
   uint64_t num_slots = 1ULL << (log_blocks_ + 3);

Reply via email to