gruuya commented on PR #9097:
URL: 
https://github.com/apache/arrow-datafusion/pull/9097#issuecomment-1923207665

   Thanks!
   
   > I wonder if we should add a similar optimization for other nodes where 
computing this is non trivial (like HashJoinExec or HashAggregateExec or 
FilterExec)
   
   Yup, this makes sense to me. In fact `AggregateExec` also always [computes 
the 
eq_props](https://github.com/apache/arrow-datafusion/blob/8b50774d6afa04f511b56019febc62fd60d26234/datafusion/physical-plan/src/aggregates/mod.rs#L369)
 when created and then throws it away (thus forcing re-computing it later on).
   
   > I could imagine even changing the signature of 
ExecutionPlan::equivalence_properties to return a reference and basically force 
pre-computing the results 🤔
   
   This also sounds good. Arguably for other nodes this might be forcing an 
eager computation of something that might not be used in theory, but in 
practice I think these two methods always get invoked during the planning 
process, at least to propagate the info up the plan tree to something that does 
cache it now (might also need some profiling/investigating here too).
   
   I can try and make these additional changes separately if other people think 
it could be a good idea. I also think breaking up 
`DefaultPhysicalPlanner::create_initial_plan`s `LogicalPlan::Join` case would 
be beneficial, since I think that's what's causing the stack overflow for the 
`tpcds_physical_q64` test (which would then also cover the present issue).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to