wjones127 commented on code in PR #15077:
URL: https://github.com/apache/arrow/pull/15077#discussion_r1057907207
##########
r/tests/testthat/test-dplyr-join.R:
##########
@@ -371,3 +371,24 @@ test_that("full joins handle keep", {
)
}
})
+
+test_that("right_join correctly coalesces keys", {
Review Comment:
Instead of another test, could we modify the existing `left` and `to_join`
to have keys that don't fully overlap?
https://github.com/apache/arrow/blob/785ab5f3c3aa4e5f2e30655d4765ca729bee7aa2/r/tests/testthat/test-dplyr-join.R#L20-L27
Then we get coverage from:
https://github.com/apache/arrow/blob/785ab5f3c3aa4e5f2e30655d4765ca729bee7aa2/r/tests/testthat/test-dplyr-join.R#L143-L152
##########
r/R/dplyr-join.R:
##########
@@ -72,7 +72,18 @@ right_join.arrow_dplyr_query <- function(x,
suffix = c(".x", ".y"),
...,
keep = FALSE) {
- do_join(x, y, by, copy, suffix, ..., keep = keep, join_type = "RIGHT_OUTER")
+
+ # Initially keep join keys so we can coalesce them after when keep=FALSE
+ query <- do_join(x, y, by, copy, suffix, ..., keep = TRUE, join_type =
"RIGHT_OUTER")
+
+ # If we are doing a right outer join and not keeping the join keys of
+ # both sides, we need to coalesce. Otherwise, rows that exist in the
+ # RHS will have NAs for the join keys.
+ if (!keep) {
+ query$selected_columns <- post_join_projection(names(x), names(y),
handle_join_by(by, x, y), suffix)
+ }
Review Comment:
This works, but it might be straightforward to fix here:
https://github.com/apache/arrow/blob/785ab5f3c3aa4e5f2e30655d4765ca729bee7aa2/r/R/dplyr-join.R#L37-L43
in the case of right join, we need to do the `setdiff` to produce the
`left_output` and leave `right_output` as `names(y)`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]