jayshrivastava commented on issue #8777:
URL: https://github.com/apache/datafusion/issues/8777#issuecomment-3861190939

   > I think this problem is actually very similar to RepatitionExec and the 
distributor channels in there, except that RepartitionExec sends 1 row to 1 
output, the construct required here would just send them to ALL outputs. And 
we'll likely want a similar buffer limiter, so that the CTE output isn't 
buffered unlimited when no consumer polls the data (at the same time, multiple 
consumer may poll at the same time). So I would propose to use the same 
underlying "distributor channel" primitive (or whatever we replace that with).
   
   This is a great solution. We cannot enable it if there's a join. We should 
either re-execute the shared child in this case or perhaps optionally can 
buffer all (in memory or disk). Step 1 can be falling back to re-executing the 
child.
   ```
                    (CTE / shared subplan)  X
                           |
                           v
                    +----------------+
                    |   FanoutExec  |   (bounded buffer N)
                    +----------------+
                      |          |
                      |          |
                      v          v
             +----------------+   +----------------+
             | HashJoin BUILD |   | HashJoin PROBE |
             |  (drains X)    |   | (waits!)       |
             +----------------+   +----------------+
                      \              /
                       \            /
                        v          v
                       +----------------+
                       |   HashJoin     |
                       +----------------+
   ```
   This might apply to nodes other than joins... Thinking out loud, I think the 
general thing to check for is - if there is any node which is a common ancestor 
of any two parents of "FanoutExec" that has dependencies between its children, 
then remove the "FanoutExec". "has dependencies between its children" is hard 
to check for. Maybe a new property on `ExecutionPlan` would help with that. We 
might also need some sort of cycle detection to check for common ancestors.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to