rtpsw commented on code in PR #13880: URL: https://github.com/apache/arrow/pull/13880#discussion_r965531866
########## cpp/src/arrow/compute/exec/asof_join_node.cc: ########## @@ -294,10 +473,22 @@ class InputState { // Index of the time col col_index_t time_col_index_; // Index of the key col - col_index_t key_col_index_; + std::vector<col_index_t> key_col_index_; + // Type id of the time column + Type::type time_type_id_; + // Type id of the key column + std::vector<Type::type> key_type_id_; + // Hasher for key elements + mutable KeyHasher* key_hasher_; + // True if hashing is mandatory + bool must_hash_; + // True if by-key values may be rehashed + bool may_rehash_; Review Comment: Rehash means hashing again using a different evaluation for the by-key. There are two such evaluations, one for the fast-path using `key_value` and one for the slow-path using `Hashing64`. The former applies to a single-key of a supported type; it is the identity function, which can be viewed as perfect-hashing over non-null values of the type. A rehash happens when: 1. Initially, the fast-path was used. As noted, this occurs when the by-key is a single-key of a supported type. 2. Later on, the slow-path must be switched-over to. This occurs when null values are first observed for the by-key. The fast-path may be used for many batches until one batch shows up that causes a switch-over to the slow-path. This switch-over involves rehashing, i.e., recreating the memo-store with the slow-path's evaluation of the keys. Once a switch-over happens, it is never reversed. As for the member fields here: `must_hash_` indicates whether hashing is mandatory, i.e., whether the slow-path is being used, be it initially or after a switch-over, and `may_rehash_` indicates whether a rehash may still happen in the future. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org