neilconway commented on PR #21240:
URL: https://github.com/apache/datafusion/pull/21240#issuecomment-4170337097
> I am a bit suprised that it is even needed to do some
speculative/overlapping to get performance parity. AFAIK NestedLoopJoinExec
doesn't do this (it will execute build side first then probe).
This was surprising to me as well, but on thinking about it further, it
kinda makes sense. `CrossJoinExec::execute()` does
```rust
let stream = self.right.execute(partition, Arc::clone(&context))?;
// ...
let left_fut = self.left_fut.try_once(|| {
let left_stream = self.left.execute(0, context)?;
Ok(load_left_input(
left_stream,
join_metrics.clone(),
reservation,
))
})?;
Ok(Box::pin(CrossJoinStream {...}))
```
i.e., we basically start up both inputs and allow them to do some work if
they want to (e.g., `RepartitionExec::execute()`,
`CoalescePartitionsExec::execute()`, `SortPreservingMergeExec::execute()` will
also kickoff some background work on the first `execute()` call).
Whereas in the current ScalarSubqueryExec implementation, we don't do
*anything* with the main plan until the evaluation of *all* subqueries has
completed, which means we do lose some opportunity to overlap work that NLJ is
able to take advantage of.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]