rtpsw commented on PR #13880:
URL: https://github.com/apache/arrow/pull/13880#issuecomment-1239732953

   > Let's do 1 for now.
   
   Will do.
   
   > That being said, don't we only need to worry about collisions within the 
on field's tolerance? Or do we have to worry about a collision anywhere in the 
dataset?
   
   It is normal for an on-key value of one row to be equal to that of the 
previous one, and the `AsofJoinNode` code handles that fine. The issue happens 
with the by-key, when two rows have different by-key tuples but equal hashes 
for them, because there is currently no `AsofJoinNode` code to arbitrate this 
case. In contrast, the hash-join code uses a Swiss table (as in my item 2, and 
at some performance cost), which does arbitrate this case by checking equality 
of the by-key tuples.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to