comphead commented on PR #23167:
URL: https://github.com/apache/datafusion/pull/23167#issuecomment-4794906771

   > Spark's shuffle is a pipeline breaker; this PR is stating that any 
pipeline breaker gives us the same epistemic guarantee, just at finer 
granularity than full-stage shuffles
   
   Shuffle is not always a pipeline breaker, but the concept of pipeline 
breaker would help in DF if we want to deal with runtime decision, to consume 
all inputs before producing output. Overall the design makes sense to me IMO. 
But how would you use it for joins anyway? 🤔 
   
   For SMJ the sorting is a pipeline breaker. 
   What about HJ, NLJ? What would be pipeline breakers for them?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to