rtpsw commented on PR #13880: URL: https://github.com/apache/arrow/pull/13880#issuecomment-1239732953
> Let's do 1 for now. Will do. > That being said, don't we only need to worry about collisions within the on field's tolerance? Or do we have to worry about a collision anywhere in the dataset? It is normal for an on-key value of one row to be equal to that of the previous one, and the `AsofJoinNode` code handles that fine. The issue happens with the by-key, when two rows have different by-key tuples but equal hashes for them, because there is currently no `AsofJoinNode` code to arbitrate this case. In contrast, the hash-join code uses a Swiss table (as in my item 2, and at some performance cost), which does arbitrate this case by checking equality of the by-key tuples. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
