nealrichardson commented on code in PR #14160:
URL: https://github.com/apache/arrow/pull/14160#discussion_r975171976


##########
r/tests/testthat/test-dplyr-collect.R:
##########
@@ -276,3 +276,16 @@ test_that("collapse doesn't unnecessarily add 
ProjectNodes", {
   )
   expect_length(grep("ProjectNode", plan), 1)
 })
+
+test_that("compute", {

Review Comment:
   I would put this test in `test-dplyr-query.R` and rename this file back to 
`test-dplyr-collapse.R` since it's all about `collapse()`. I get the point of 
trying to have the R and test file names matching, but it's not so 
straightforward here--nearly every dplyr test we have calls `collect()` so 
we're testing it everywhere.
   
   Also let's give the test a more expressive name
   
   ```suggestion
   test_that("compute() on a grouped query returns a Table with groups in 
metadata", {
   ```



##########
r/tests/testthat/test-dplyr-query.R:
##########
@@ -119,7 +119,7 @@ test_that("collect(as_data_frame=FALSE)", {
     filter(int > 5) %>%
     group_by(int) %>%
     collect(as_data_frame = FALSE)
-  expect_s3_class(b4, "arrow_dplyr_query")

Review Comment:
   I think we need a test (and matching code change) that a Table with groups 
in the `$metadata` prints that it is grouped. Currently, the print method does 
not reveal that you have groupings, even though they get restored when you 
`as.data.frame()`:
   
   ```
   > tab <- arrow_table(mtcars)
   > tab$metadata$r$attributes$.group_vars <- "cyl"
   > tab
   Table
   32 rows x 11 columns
   $mpg <double>
   $cyl <double>
   $disp <double>
   $hp <double>
   $drat <double>
   $wt <double>
   $qsec <double>
   $vs <double>
   $am <double>
   $gear <double>
   $carb <double>
   
   See $metadata for additional Schema metadata
   ```
   
   I wonder what else is lost or concealed by this change. This feature of 
reading groups from `$metadata$r$attributes$.group_vars` was a bit of a hack 
and not something we're generally relying on, so I don't think it's a simple 
change. Or at least, it's not something we can just rely that the test suite 
already covers everything we care about. (Search for ".group_vars" in the code, 
it's not touched in many places, and only in one test.)
   
   Not a reason not to do it, just need to be thorough.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to