Re: [I] Add AQE to DataFusion [datafusion]

via GitHub Wed, 01 Jul 2026 12:36:51 -0700


Dandandan commented on issue #23194:
URL: https://github.com/apache/datafusion/issues/23194#issuecomment-4859362533


   One point I want to make is that dynamic optimization of different stages 
(a.k.a. AQE in Spark) definitely can be used in a single process / single node 
env as well.
   
   We still several pipeline breaking operators, e.g. sorts and hash join build 
side, which basically needs to load the full input before it can make progress.
   
   Currently this could be done ad-hoc inside each operator,  but a (small) 
framework to mark pipeline breaking stages and dynamically re-optimizing the 
plan sounds like a more principled way of doing things.
   
   Some obvious examples include:
   
   * Join reordering
   * Push down dynamic filters (currently it is done inside operators)
   * Update statistics (for join, aggregations or parallel merge) 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Add AQE to DataFusion [datafusion]

Reply via email to