alamb commented on issue #23093: URL: https://github.com/apache/datafusion/issues/23093#issuecomment-4770428368
This is an interesting idea -- thank you for filing it @avantgardnerio One question I have is how this will work with specific operators? For example, since DataFusion's `ExecutionPlan` is streaming, most operators will not have seen all of their input when their parents start executing. When they start they won't know their actual output partitioning Thus how would the parent operators know when they could rely on the partitioning declaration of their children? I do see that `single-partition window functions` is an example of an operator that might buffer its entire input before producing output so in that case the parent could ask before the first row is produced. But if the point is that this mechanism only works for "pipeline breaking operators" maybe it is something we could handle at a higher level -- for example building some sort of more "adaptive query processing" into DataFusion (a way to re-optimize the plan after some blocking operator has run and we know more about the statistics -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
