Dandandan commented on issue #23194: URL: https://github.com/apache/datafusion/issues/23194#issuecomment-4859362533
One point I want to make is that dynamic optimization of different stages (a.k.a. AQE in Spark) definitely can be used in a single process / single node env as well. We still several pipeline breaking operators, e.g. sorts and hash join build side, which basically needs to load the full input before it can make progress. Currently this could be done ad-hoc inside each operator, but a (small) framework to mark pipeline breaking stages and dynamically re-optimizing the plan sounds like a more principled way of doing things. Some obvious examples include: * Join reordering * Push down dynamic filters (currently it is done inside operators) * Update statistics (for join, aggregations or parallel merge) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
