jorgecarleitao commented on pull request #7880: URL: https://github.com/apache/arrow/pull/7880#issuecomment-673890773
Thank you very much @alamb for reviewing it! This optimizer is mostly useful in the `table` or `DataFrame` API, on which a view can be declared as a sequence of statements that are not optimized for execution, but optimized for a logical and code organization's point of view. One example is when we have a dataframe `df` that was constructed optimally, but we would like to only look at rows whose `'a' > 2`. Instead of having to go through the actual code that built that DataFrame and place the filter in the correct place after investigating where we should place it, we can just write `df.filter(df['a'] > 2).collect()`, and let the optimizer figure it out where to place it. I incorporated the comments above into #7879 , as IMO they are part of that PR, and rebased the whole thing. I will still address your comment about not full understanding the algorithm by adding a more extended comment and maybe try drawing some ASCII to better explain the idea, so that it is not only on my head. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org