gabotechs commented on issue #23194:
URL: https://github.com/apache/datafusion/issues/23194#issuecomment-4841887365

   Answering @avantgardnerio here:
   
   > its oneshot→gRPC delivery becomes the StatsListener impl — and DF picks up 
the observation primitive natively. Do you think this would be a fruitful 
direction? If you'd like to merge in SamplerExec then DF would have the perfect 
slot for it. Or if you prefer, the work would be lifted directly with 
attribution.
   
   `SamplerExec` is really just the tip of the iceberg, it's only useful in the 
context of the coordination logic of `datafusion-distributed`, as a 
coordinating entity needs to be responsible of:
   1) Kicking it off eagerly so that runtime sampling starts before execution 
([done 
here](https://github.com/datafusion-contrib/datafusion-distributed/blob/2203b3bfdd93b1ad258e60a6dc668c9911b7b9ff/src/worker/impl_coordinator_channel.rs#L104-L106))
   2) Collecting the runtime samples, reconciling the different samples from 
the different partitions, and building a 
`datafusion::physical_plan::Statistics` out of the reconciled samples ([done 
here](https://github.com/datafusion-contrib/datafusion-distributed/blob/2203b3bfdd93b1ad258e60a6dc668c9911b7b9ff/src/coordinator/prepare_dynamic_plan.rs#L195-L195)).
   3) Re-arranging the yet un-sampled remaining part of the plan based on the 
runtime collected `datafusion::physical_plan::Statistics` ([done 
here](https://github.com/datafusion-contrib/datafusion-distributed/blob/0ac54f00f70d6d9c56ffb43534227635eadc08a3/src/coordinator/prepare_dynamic_plan.rs#L70-L70))
   
   So unless it's decided that `datafusion-distributed` coordinating machinery 
is general enough to also be ported to `datafusion`, I think there's not a lot 
to gain from moving just `SamplerExec` to `datafusion`, as I don't think it 
will generalize well to other coordinating machineries.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to