nealrichardson commented on code in PR #14160: URL: https://github.com/apache/arrow/pull/14160#discussion_r975171976
########## r/tests/testthat/test-dplyr-collect.R: ########## @@ -276,3 +276,16 @@ test_that("collapse doesn't unnecessarily add ProjectNodes", { ) expect_length(grep("ProjectNode", plan), 1) }) + +test_that("compute", { Review Comment: I would put this test in `test-dplyr-query.R` and rename this file back to `test-dplyr-collapse.R` since it's all about `collapse()`. I get the point of trying to have the R and test file names matching, but it's not so straightforward here--nearly every dplyr test we have calls `collect()` so we're testing it everywhere. Also let's give the test a more expressive name ```suggestion test_that("compute() on a grouped query returns a Table with groups in metadata", { ``` ########## r/tests/testthat/test-dplyr-query.R: ########## @@ -119,7 +119,7 @@ test_that("collect(as_data_frame=FALSE)", { filter(int > 5) %>% group_by(int) %>% collect(as_data_frame = FALSE) - expect_s3_class(b4, "arrow_dplyr_query") Review Comment: I think we need a test (and matching code change) that a Table with groups in the `$metadata` prints that it is grouped. Currently, the print method does not reveal that you have groupings, even though they get restored when you `as.data.frame()`: ``` > tab <- arrow_table(mtcars) > tab$metadata$r$attributes$.group_vars <- "cyl" > tab Table 32 rows x 11 columns $mpg <double> $cyl <double> $disp <double> $hp <double> $drat <double> $wt <double> $qsec <double> $vs <double> $am <double> $gear <double> $carb <double> See $metadata for additional Schema metadata ``` I wonder what else is lost or concealed by this change. This feature of reading groups from `$metadata$r$attributes$.group_vars` was a bit of a hack and not something we're generally relying on, so I don't think it's a simple change. Or at least, it's not something we can just rely that the test suite already covers everything we care about. (Search for ".group_vars" in the code, it's not touched in many places, and only in one test.) Not a reason not to do it, just need to be thorough. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org