rtpsw commented on code in PR #34392:
URL: https://github.com/apache/arrow/pull/34392#discussion_r1215407511
##########
cpp/src/arrow/acero/asof_join_node.cc:
##########
@@ -1059,7 +1232,7 @@ class AsofJoinNode : public ExecNode {
// If RHS is finished, then we know it's up to date
if (rhs.CurrentEmpty())
return false; // RHS isn't finished, but is empty --> not up to date
- if (lhs_ts >= rhs.GetCurrentTime())
Review Comment:
I looked into this to some degree. When '>=' is used instead, as-of-join
hangs for the `Basic7Forward` test-case, because both right tables have distant
times (right at the beginning) and none of them advances.
I tend to think that '>=' here won't work in general. Consider the case
where the left table is at time `t` and a given right table has multiple rows
with time `t`. In a past-join, we'd want to advance the right table past `t`
and overwrite the stored entry up to the last row with time `t`, which would be
ready for joining. In a future-join, we'd want to advance the same way but to
store all rows except the first as future entries. Now, consider the case where
the given right table has a row with a distant time `t+tolerance+1`. In a
future-join, we'd want to advance the right table to this row and store it but
only make it current when it is no longer distant. So, in a clearer (if not
fixing) solution, I suspect the tolerance horizon needs to be accounted for in
the condition here as well as in the storing/joining.
This needs further looking into in a separate issue.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]