alamb commented on pull request #9639:
URL: https://github.com/apache/arrow/pull/9639#issuecomment-791749787


   > Someone still might want to add some filter / aggregate on the dataframe, 
so maybe it makes sense the optimization pass only works on collect?
   
   Ideally in my mind we would be able to run the optimizations twice (so we 
could do it with the initial call to `sql` but then if someone added more 
grouping or reparitioning or something, we could run the optimizer passes 
again. 
   
   @Dandandan  something I have been thinking recently (as I prepared for my 
talk next week on DataFusion as well as talking with @NGA-TRAN  on my team at 
Influx) was how similar the `LogicalPlanBuilder` and `DataFrame` APIs were (and 
in fact the `DataFrameImpl` basically calls the functions on LogicalPlanBuilder.
   
   I almost wonder if we should combine the two somehow... I don't have a 
concrete proposal now just 🤔 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to