zhuqi-lucas commented on PR #16196:
URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2928856563
> > Only those streams that call poll_next themselves in a loop, and as a
consequence may block for an extended period of time, would need to do this.
Are there that many of those?
>
> IIRC yes -- and of few flavors. Sorting unconditionally suffers from this
problem. Aggregation suffers from it when its input is unsorted. Windowing is
prone too, but conditionally for some window frames. Joins will also
conditionally suffer from this issue, if they collect one side fully. There are
also other operators that behave this way, but in a data-dependent fashion
(e.g. partial sorting). I am sure there are also others I can't think of right
now.
>
> Therefore I still firmly hold the position that we need to either solve
the problem universally with some lower-level functionality (which is my
preference, but may not be easy to do), or delegate to leaf nodes so that this
becomes automatic as long as leaf nodes (whether real sources or synthetic) are
implemented to account for it. I remain unconvinced that modifying all
operators is the right thing to do.
Got it, thank you @ozankabak.
Updated, i create the POC of the unified solution, i think it works.
I add the physical rule to apply the leaf nodes automatically inserting the
yield support. And i testing the reproduce cases, it works well.
```rust
Before this PR:
SET datafusion.execution.target_partitions = 1;
SELECT SUM(value) FROM range(1, 50000000000);
It will always stuck until done, we can't ctrl c to stop it.
```
And besides the sql use cases, if we using exec directly, we can just add a
simple optimize logic, it will automatically apply to all leaf nodes:
```rust
// 3) optimize the plan with WrapLeaves to auto-insert Yield
let optimized = WrapLeaves::new()
.optimize(aggr.clone(), &ConfigOptions::new())?;
```
If we agree this direction, i will add more testing and polish code. Thanks.
cc @alamb @berkaysynnada @pepijnve
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]