rtpsw commented on code in PR #36499:
URL: https://github.com/apache/arrow/pull/36499#discussion_r1255982569


##########
cpp/src/arrow/acero/asof_join_node.cc:
##########
@@ -524,7 +524,7 @@ class KeyHasher {
   size_t index_;
   std::vector<col_index_t> indices_;
   std::vector<KeyColumnMetadata> metadata_;
-  const RecordBatch* batch_;
+  std::atomic<const RecordBatch*> batch_;

Review Comment:
   > Is the KeyHasher class thread safe now?
   
   I think so - see below for why.
   
   > Since with this change there are two thread using this class, we should 
clearly document what the thread safety model of this class is.
   
   [This 
doc](https://github.com/apache/arrow/blob/fd949f8fb416025dc9d77c035c1cac5f45ed1b94/cpp/src/arrow/acero/asof_join_node.cc#L493-L494)
 describes the supported concurrency. 
   
   > What happens if the hash is invalidated in the middle of processing?
   
   Here's what I think happens. The hashes are always computed for the record 
batch found at the front of the queue. The critical (and probably rare) case is 
when the input-receiving thread gets a record batch during the computation of 
hashes, whence it invalidates the key hasher and pushes the record batch to the 
queue. In this case, the important point is that this pushed record batch 
cannot be at the front of the queue because the queue is not empty and the push 
is to the back of the queue. This invalidation may only lead to a recomputation 
of hashes for the same record batch at the front of the queue.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to