gabotechs commented on issue #23194: URL: https://github.com/apache/datafusion/issues/23194#issuecomment-4841887365
Answering @avantgardnerio here: > its oneshot→gRPC delivery becomes the StatsListener impl — and DF picks up the observation primitive natively. Do you think this would be a fruitful direction? If you'd like to merge in SamplerExec then DF would have the perfect slot for it. Or if you prefer, the work would be lifted directly with attribution. `SamplerExec` is really just the tip of the iceberg, it's only useful in the context of the coordination logic of `datafusion-distributed`, as a coordinating entity needs to be responsible of: 1) Kicking it off eagerly so that runtime sampling starts before execution ([done here](https://github.com/datafusion-contrib/datafusion-distributed/blob/2203b3bfdd93b1ad258e60a6dc668c9911b7b9ff/src/worker/impl_coordinator_channel.rs#L104-L106)) 2) Collecting the runtime samples, reconciling the different samples from the different partitions, and building a `datafusion::physical_plan::Statistics` out of the reconciled samples ([done here](https://github.com/datafusion-contrib/datafusion-distributed/blob/2203b3bfdd93b1ad258e60a6dc668c9911b7b9ff/src/coordinator/prepare_dynamic_plan.rs#L195-L195)). 3) Re-arranging the yet un-sampled remaining part of the plan based on the runtime collected `datafusion::physical_plan::Statistics` ([done here](https://github.com/datafusion-contrib/datafusion-distributed/blob/0ac54f00f70d6d9c56ffb43534227635eadc08a3/src/coordinator/prepare_dynamic_plan.rs#L70-L70)) So unless it's decided that `datafusion-distributed` coordinating machinery is general enough to also be ported to `datafusion`, I think there's not a lot to gain from moving just `SamplerExec` to `datafusion`, as I don't think it will generalize well to other coordinating machineries. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
