alamb commented on issue #23093:
URL: https://github.com/apache/datafusion/issues/23093#issuecomment-4770428368

   This is an interesting idea -- thank you for filing it @avantgardnerio 
   
   One question I have is how this will work with specific operators? For 
example, since DataFusion's `ExecutionPlan` is streaming, most operators will 
not have seen all of their input when their parents start executing. When they 
start they won't know their actual output partitioning
   
   Thus how would the parent operators know when they could rely on the 
partitioning declaration of their children?
   
   I do see that `single-partition window functions` is an example of an 
operator that might buffer its entire input before producing output so in that 
case the parent could ask before the first row is produced. 
   
   But if the point is that this mechanism only works for "pipeline breaking 
operators" maybe it is something we could handle at a higher level -- for 
example building some sort of more "adaptive query processing" into DataFusion 
(a way to re-optimize the plan after some blocking operator has run and we know 
more about the statistics
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to