rtpsw commented on PR #34392:
URL: https://github.com/apache/arrow/pull/34392#issuecomment-1531964894
**First problem: hang on distant times**
What was the problem? In a future as-of-join, when the right table's next
timestamp was distant (i.e., beyond the future tolerance) compared to the left
table's, the join hanged.
What was the cause of the problem? The as-of-join node mishandled the
`MemoStore` maintenance, failed to advance any of the tables and went into an
infinite loop. The as-of-join node wasn't tested before with distant times.
What was the fix and how?
[This](https://github.com/apache/arrow/pull/34392/commits/000219736c0abb2e0c9525963f083c0ab04d16cc)
and [this
commit](https://github.com/apache/arrow/pull/34392/commits/0dc7fa2339ddfa73be30f4780a35f3ca057f8957)
fixed the `MemoStore` maintenance and the condition for advancing, and added a
test-case with distant times.
**Second problem: non-deterministic output**
What was the problem? The newly added test-case produced a different output
rarely and only on specific platforms/CI-jobs.
What was the cause of the problem? The as-of-join node used cached hashes,
instead of computing new ones, for the key columns of a new batch. This
happened because the new batch had the same pointer-address as the previous
one, and the cache-invalidation condition relied on the pointer-address
changing for a new batch. This rare same-pointer-address condition, which
triggered the bug, occurred only on specific platforms/CI-jobs, likely under
restricted memory resources.
What was the fix and how? The
[fix](https://github.com/apache/arrow/pull/34392/commits/34134f0b40ea241fa3c686df06c487d0539b9c99)
added cache invalidation upon receiving a new batch, allowing the hashes to be
recomputed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]