jonkeane commented on code in PR #48630:
URL: https://github.com/apache/arrow/pull/48630#discussion_r2649177582
##########
r/tests/testthat/test-dplyr-join.R:
##########
@@ -188,8 +188,41 @@ test_that("Error handling for unsupported expressions in
join_by", {
)
})
-# TODO: test duplicate col names
-# TODO: casting: int and float columns?
+test_that("joins with duplicate column names", {
+ # When column names are duplicated (not in by), suffixes are added
+ left_dup <- tibble::tibble(
+ x = 1:5,
+ y = 1:5,
+ z = letters[1:5]
+ )
+ right_dup <- tibble::tibble(
+ x = 1:5,
+ y = 6:10,
+ z = LETTERS[1:5]
+ )
+
+ compare_dplyr_binding(
+ .input |>
+ left_join(right_dup, by = "x") |>
+ collect(),
+ left_dup
+ )
+
+ compare_dplyr_binding(
+ .input |>
+ inner_join(right_dup, by = "x") |>
+ collect(),
+ left_dup
+ )
+
+ # Test with custom suffixes
+ compare_dplyr_binding(
+ .input |>
+ left_join(right_dup, by = "x", suffix = c("_left", "_right")) |>
+ collect(),
+ left_dup
+ )
+})
Review Comment:
This test especially is effectively duplicated by
https://github.com/apache/arrow/blob/744f0ec2cf9f8716fcea408d67ede9c14a7e6954/r/tests/testthat/test-dplyr-join.R#L320-L338
Though I will admit that the tests there do not follow the established
pattern you follow here. Would you mind updating merging these two tests
together? You can use your fixture for that purpose if you like, but if you do,
would you mind adding a column with floats and also making the key column more
obvious (e.g. by naming it `key` or `to_join` or something like that?
I'm happy to help describe in more detail if that's helpful!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]