rtpsw commented on code in PR #13880:
URL: https://github.com/apache/arrow/pull/13880#discussion_r954269544


##########
cpp/src/arrow/compute/exec/asof_join_node.cc:
##########
@@ -294,10 +452,22 @@ class InputState {
   // Index of the time col
   col_index_t time_col_index_;
   // Index of the key col
-  col_index_t key_col_index_;
+  vec_col_index_t key_col_index_;
+  // Type id of the time column
+  Type::type time_type_id_;
+  // Type id of the key column
+  std::vector<Type::type> key_type_id_;
+  // Hasher for key elements
+  mutable KeyHasher* key_hasher_;
+  // True if hashing is mandatory
+  bool must_hash_;
+  // True if null by-key values are expected
+  bool nullable_by_key_;

Review Comment:
   For performance. There are two code-paths for computing a by-key value:
   1. A slower path using hashing, which works for any kind of by-key (single 
or multi, of any type)
   2. A faster path using 
[norm-value](https://github.com/apache/arrow/pull/13880/files#diff-5789a42aebc3bd0f5a6db687c78ab0c3eef3e316982f99fe9e0ea21fabed354cR67),
 which works only for a single-primitive-types by-key
   
   We want to use the faster path whenever possible. [This 
post](https://github.com/apache/arrow/pull/13880#issuecomment-1225741731) 
explains an alternative that would avoid this flag at the expense of runtime 
complexity.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to