nealrichardson commented on PR #34576: URL: https://github.com/apache/arrow/pull/34576#issuecomment-1472024512
Maybe I misunderstand your concerns, but here's how the R query building works. `select/mutate/filter` build and modify a projection expression and filter expression. At scan time (`collect`), those get pushed down into the ScanNode when querying on a dataset. Aggregations and joins are different, in that they essentially wrap the preceding parts of the query in a black box, and nothing that happens after them modifies any previous steps. So in the case of ``` Scan -> Project[0] -> Join -> Project[1] ``` the projection of `Project[0]` is pushed into `Scan`. `Join` can only specify join keys based on the columns in `Project[0]`: if `b` isn't in `Project[0]`, you can't reference it in `Join`. We don't (currently) inspect `Project[1]` to see if further columns could be pruned from steps prior to `Join` and pushed down, i.e. `Project[1]` doesn't alter `Project[0]` and thus not `Scan`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
