thisisnic commented on code in PR #15077:
URL: https://github.com/apache/arrow/pull/15077#discussion_r1060395135
##########
r/R/dplyr-join.R:
##########
@@ -72,7 +72,18 @@ right_join.arrow_dplyr_query <- function(x,
suffix = c(".x", ".y"),
...,
keep = FALSE) {
- do_join(x, y, by, copy, suffix, ..., keep = keep, join_type = "RIGHT_OUTER")
+
+ # Initially keep join keys so we can coalesce them after when keep=FALSE
+ query <- do_join(x, y, by, copy, suffix, ..., keep = TRUE, join_type =
"RIGHT_OUTER")
+
+ # If we are doing a right outer join and not keeping the join keys of
+ # both sides, we need to coalesce. Otherwise, rows that exist in the
+ # RHS will have NAs for the join keys.
+ if (!keep) {
+ query$selected_columns <- post_join_projection(names(x), names(y),
handle_join_by(by, x, y), suffix)
+ }
Review Comment:
I tried the way you suggested, but I think I still need the
`post_join_projection()` call; if I revert what I've done and just make the
changes suggested above, I get an error in my test as the schema expects
`some_grouping.x` and `some_grouping.y` columns to have been created, but they
have not.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]