adriangb commented on PR #21666:
URL: https://github.com/apache/datafusion/pull/21666#issuecomment-4260535615

   Yes. And I think this was another bandaid. But it's closer to the root cause 
than previous attempts. This has to do with cancellation when multiple joins 
are involved.
   
   TLDR I think what is happening is when you have multiple joins you end up 
with a tree of operators. One of the joins up higher in the tree hits the new 
optimization and aborts work, dropping tasks that would have polled downstream 
joins. But not the downstream join is stuck waiting for all of it's partition 
tasks to finish even though they never will. I think we were all operating 
under the assumption that the issue was within a single join operator but 
really it's an issue any time an upstream operator cancels on a join.
   
   I think the real solution is to track when a join build partition task gets 
dropped and report that to the dynamic filter building so that it doesn't wait 
for that partition to report.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to