rtpsw commented on PR #36499:
URL: https://github.com/apache/arrow/pull/36499#issuecomment-1623678477

   >Is KeyHasher thread-safe? AFAIU, key_hasher_ can be used from two threads 
at a time... GetKey calling key_hasher_->HashesFor and Push calling 
key_hasher_->Invalidate.
   
   [This 
doc](https://github.com/apache/arrow/blob/fd949f8fb416025dc9d77c035c1cac5f45ed1b94/cpp/src/arrow/acero/asof_join_node.cc#L493-L494)
 addresses thread-safety of `Invalidate` and `HashesFor`.
   
   > I also don't understand why you call Invalidate explicitly, while the 
KeyHasher is supposed to invalidate automatically.
   
   Presumably, you mean [this 
invocation](https://github.com/apache/arrow/blob/fd949f8fb416025dc9d77c035c1cac5f45ed1b94/cpp/src/arrow/acero/asof_join_node.cc#L502)
 of `Invalidate`. Its for safety - it keeps the KeyHasher invalid if any code 
(now or in the future) fails between that point and [the clean 
return](https://github.com/apache/arrow/blob/fd949f8fb416025dc9d77c035c1cac5f45ed1b94/cpp/src/arrow/acero/asof_join_node.cc#L518-L519).
   
   I suspect that with the explicit invalidation in place, we could just have 
`if (batch)` 
[here](https://github.com/apache/arrow/blob/fd949f8fb416025dc9d77c035c1cac5f45ed1b94/cpp/src/arrow/acero/asof_join_node.cc#L499).
 I could look into this separately. Given past experience, I'd prefer to make 
small steps when dealing with as-of-join-node's concurrency.
   
   > However, I see that KeyHasher is storing a raw RecordBatch pointer, which 
is problematic. If the previous std::shared_ptr<RecordBatch> was destroyed, 
another std::shared_ptr<RecordBatch> could re-allocate the same underlying 
RecordBatch pointer, which is unlikely but not impossible.
   
   The explicit invalidation was introduced to resolve exactly this case, which 
was observed on macOS.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to