nealrichardson commented on PR #34576:
URL: https://github.com/apache/arrow/pull/34576#issuecomment-1472024512

   Maybe I misunderstand your concerns, but here's how the R query building 
works. `select/mutate/filter` build and modify a projection expression and 
filter expression. At scan time (`collect`), those get pushed down into the 
ScanNode when querying on a dataset. Aggregations and joins are different, in 
that they essentially wrap the preceding parts of the query in a black box, and 
nothing that happens after them modifies any previous steps.
   
   So in the case of 
   
   ```
   Scan -> Project[0] -> Join -> Project[1]
   ```
   
   the projection of `Project[0]` is pushed into `Scan`. `Join` can only 
specify join keys based on the columns in `Project[0]`: if `b` isn't in 
`Project[0]`, you can't reference it in `Join`. We don't (currently) inspect 
`Project[1]` to see if further columns could be pruned from steps prior to 
`Join` and pushed down, i.e. `Project[1]` doesn't alter `Project[0]` and thus 
not `Scan`. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to