rtpsw commented on code in PR #13880:
URL: https://github.com/apache/arrow/pull/13880#discussion_r965531866


##########
cpp/src/arrow/compute/exec/asof_join_node.cc:
##########
@@ -294,10 +473,22 @@ class InputState {
   // Index of the time col
   col_index_t time_col_index_;
   // Index of the key col
-  col_index_t key_col_index_;
+  std::vector<col_index_t> key_col_index_;
+  // Type id of the time column
+  Type::type time_type_id_;
+  // Type id of the key column
+  std::vector<Type::type> key_type_id_;
+  // Hasher for key elements
+  mutable KeyHasher* key_hasher_;
+  // True if hashing is mandatory
+  bool must_hash_;
+  // True if by-key values may be rehashed
+  bool may_rehash_;

Review Comment:
   Rehash means hashing again using a different evaluation for the by-key. 
There are two such evaluations, one for the fast-path using `key_value` and one 
for the slow-path using `Hashing64`. The former applies to a single-key of a 
supported type; it is the identity function, which can be viewed as 
perfect-hashing over non-null values of the type.
   
   A rehash happens when:
   1. Initially, the fast-path was used. As noted, this occurs when the by-key 
is a single-key of a supported type.
   2. Later on, the slow-path must be switched-over to. This occurs when null 
values are first observed for the by-key.
   
   The fast-path may be used for many batches until one batch shows up that 
causes a switch-over to the slow-path. This switch-over involves rehashing, 
i.e., recreating the memo-store with the slow-path's evaluation of the keys. 
Once a switch-over happens, it is never reversed.
   
   As for the member fields here: `must_hash_` indicates whether hashing is 
mandatory, i.e., whether the slow-path is being used, be it initially or after 
a switch-over, and `may_rehash_` indicates whether a rehash may still happen in 
the future.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to