rtpsw commented on code in PR #13880:
URL: https://github.com/apache/arrow/pull/13880#discussion_r960634730


##########
cpp/src/arrow/compute/exec/asof_join_node.cc:
##########
@@ -294,10 +452,22 @@ class InputState {
   // Index of the time col
   col_index_t time_col_index_;
   // Index of the key col
-  col_index_t key_col_index_;
+  vec_col_index_t key_col_index_;
+  // Type id of the time column
+  Type::type time_type_id_;
+  // Type id of the key column
+  std::vector<Type::type> key_type_id_;
+  // Hasher for key elements
+  mutable KeyHasher* key_hasher_;
+  // True if hashing is mandatory
+  bool must_hash_;
+  // True if null by-key values are expected
+  bool nullable_by_key_;

Review Comment:
   According to [benchmark 
results](https://github.com/apache/arrow/pull/13880#issuecomment-1234248082), 
the latest code in this PR is ~15% slower than its baseline. Since the 
benchmark only covers single-key, namely the fast-path, then clearly there is 
already a performance cost without the hashing. Of course, we can expect 
hashing to have a yet higher performance cost.
   
   I'm fine with getting rid of the `nullable_by_key` option. If it were up to 
me, I'd vote for spending a bit more time on performance tuning thereafter.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to