rtpsw commented on code in PR #34392:
URL: https://github.com/apache/arrow/pull/34392#discussion_r1203534738
##########
cpp/src/arrow/acero/asof_join_node.cc:
##########
@@ -268,11 +397,18 @@ struct MemoStore {
} else {
future_entries_[key].emplace(time, batch, row);
}
- if (!no_future_ || times_.empty() || times_.front() != time) {
+ // Maintain distinct times:
+ // If no times are currently maintained then the given time is distinct
and hence
+ // pushed. Otherwise, the invariant is that the latest time is at the
back. If no
+ // future times are maintained, then only one time is maintained, and
hence it is
+ // overwritten with the given time. Otherwise, the given time must be no
less than the
+ // latest time, due to time ordering, so it is pushed back only if it is
distinct.
+ if (times_.empty() || (!no_future_ && times_.back() != time)) {
times_.push_back(time);
} else {
- times_.front() = time;
+ times_.back() = time;
}
+ current_time_ = time;
Review Comment:
> I'm curious, why don't we use `UpdateTime` here?
This is because the `time` argument to `MemoStore::Store` here is always the
most advanced seen by the `MemoStore` instance - I'll add a doc of this. Using
`UpdateTime(time)` won't harm but could lead one to think that it might fail to
update, which isn't the case.
> Also, it looks like the for_time argument is unused?
It's used just in the debug-message. I'll make this explicit using
`DEBUG_ADD`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]