gruuya commented on PR #9097: URL: https://github.com/apache/arrow-datafusion/pull/9097#issuecomment-1923207665
Thanks! > I wonder if we should add a similar optimization for other nodes where computing this is non trivial (like HashJoinExec or HashAggregateExec or FilterExec) Yup, this makes sense to me. In fact `AggregateExec` also always [computes the eq_props](https://github.com/apache/arrow-datafusion/blob/8b50774d6afa04f511b56019febc62fd60d26234/datafusion/physical-plan/src/aggregates/mod.rs#L369) when created and then throws it away (thus forcing re-computing it later on). > I could imagine even changing the signature of ExecutionPlan::equivalence_properties to return a reference and basically force pre-computing the results 🤔 This also sounds good. Arguably for other nodes this might be forcing an eager computation of something that might not be used in theory, but in practice I think these two methods always get invoked during the planning process, at least to propagate the info up the plan tree to something that does cache it now (might also need some profiling/investigating here too). I can try and make these additional changes separately if other people think it could be a good idea. I also think breaking up `DefaultPhysicalPlanner::create_initial_plan`s `LogicalPlan::Join` case would be beneficial, since I think that's what's causing the stack overflow for the `tpcds_physical_q64` test (which would then also cover the present issue). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
