[GitHub] [arrow] westonpace commented on pull request #10845: ARROW-13268 [C++][Compute] Add ExecNode for semi and anti-semi join

GitBox Tue, 24 Aug 2021 14:22:45 -0700


westonpace commented on pull request #10845:
URL: https://github.com/apache/arrow/pull/10845#issuecomment-904986082

Digging further I can reproduce it reliably if I configure the producer (the
background generator in test_util.h) to be eventually slow and the source node
to be initially slow (the Loop method in start producing). Since the loop
method is initially slow the futures are already finished and it runs on the
unit test thread. Once the loop moves faster and the background generator
slows down then the futures are not already finished and they get queued on the
thread pool.

The test itself seems to by trying to test a parallel case (parallel=true)
yet it still isn't specifying an exec_context with an executor. Adding an
executor to the exec_context triggers a different race condition here:
https://github.com/apache/arrow/blob/523b618d4e7317bc8a09ca7025ae4688c07b0bdc/cpp/src/arrow/compute/exec/hash_join_node.cc#L213

cached is a reference to an element in `cached_probe_batches` which is soon
cleared.

So, my recommendation for this PR is to change all tests (parallel or not)
to use an exec context which specifes an executor. Then work through those
issues.

I don't think it is valid to NOT have an executor (and I will be opening a
separate JIRA dedicated to that issue).

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] westonpace commented on pull request #10845: ARROW-13268 [C++][Compute] Add ExecNode for semi and anti-semi join

Reply via email to