alamb commented on pull request #9639: URL: https://github.com/apache/arrow/pull/9639#issuecomment-791749787
> Someone still might want to add some filter / aggregate on the dataframe, so maybe it makes sense the optimization pass only works on collect? Ideally in my mind we would be able to run the optimizations twice (so we could do it with the initial call to `sql` but then if someone added more grouping or reparitioning or something, we could run the optimizer passes again. @Dandandan something I have been thinking recently (as I prepared for my talk next week on DataFusion as well as talking with @NGA-TRAN on my team at Influx) was how similar the `LogicalPlanBuilder` and `DataFrame` APIs were (and in fact the `DataFrameImpl` basically calls the functions on LogicalPlanBuilder. I almost wonder if we should combine the two somehow... I don't have a concrete proposal now just 🤔 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
