Re: [I] Add AQE to DataFusion [datafusion]

via GitHub Fri, 26 Jun 2026 01:48:02 -0700


asolimando commented on issue #23194:
URL: https://github.com/apache/datafusion/issues/23194#issuecomment-4807999502


   Similar work for distributed-dafafusion from @gabotechs: 
https://github.com/datafusion-contrib/datafusion-distributed/pull/486
   
   I only looked at the issue/PR description for this issue, but I can already 
see many commonalities:
   - pipeline breaker/boundaries as a good place to accumulate runtime 
statistics
   - re-use of the built-in statistics propagation mechanism (great for 
re-use), only fueled with runtime statistics
   - runtime statistics must be sampled as we are in a streaming computational 
model (idea behind `SamplerExec` in the above PR, and the similar buffer node 
here)
   
   I wonder how much can be re-used across core DF/distributed DF/ballista, 
there are different challenges and the same logical concept has different forms 
in the three cases, but the mechanism seems very similar, if not identical.
   
   WDYT?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Add AQE to DataFusion [datafusion]

Reply via email to