asolimando commented on issue #22958: URL: https://github.com/apache/datafusion/issues/22958#issuecomment-4727193341
> > and child traversal directly inside each operator's statistics_with_args override via args.compute_child_statistics(...). This means: > > Maybe we could add the children directly to StatisticsArgs? > > We would have to add the traversal somehow -- perhaps as a visitor or something 🤔 Took a pass at this to make the discussion more concrete (currently stacked on #21815): [asolimando/datafusion@asolimando/22958-statistics-from-inputs](https://github.com/asolimando/datafusion/tree/asolimando/22958-statistics-from-inputs) Went with the shape @2010YOUY01 proposed, children passed as a dedicated arg rather than folded into `StatisticsArgs`: ```rust fn statistics_from_inputs(&self, input_stats: &[Arc<Statistics>], args: &StatisticsArgs) -> Result<Arc<Statistics>>; ``` `StatisticsContext` owns the traversal + cache (the "visitor" piece), which was previously part of `StatisticsArgs` in the latest version of #21815. There are two choices I'd like your read on: 1. Children as a separate `input_stats` arg vs. inside `StatisticsArgs`, as I felt it was goo to keep "data vs config" separate (but it's not hard to merge them). 2. Removed `statistics_with_args` outright (it's unreleased so probably OK, we can keep that method name instead of `statistics_from_inputs` if you both agree on that). `partition_statistics` stays deprecated as per #21815. This can be a separate PR or fold into #21815, whichever you prefer. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
