nealrichardson commented on code in PR #14160:
URL: https://github.com/apache/arrow/pull/14160#discussion_r975171976
##########
r/tests/testthat/test-dplyr-collect.R:
##########
@@ -276,3 +276,16 @@ test_that("collapse doesn't unnecessarily add
ProjectNodes", {
)
expect_length(grep("ProjectNode", plan), 1)
})
+
+test_that("compute", {
Review Comment:
I would put this test in `test-dplyr-query.R` and rename this file back to
`test-dplyr-collapse.R` since it's all about `collapse()`. I get the point of
trying to have the R and test file names matching, but it's not so
straightforward here--nearly every dplyr test we have calls `collect()` so
we're testing it everywhere.
Also let's give the test a more expressive name
```suggestion
test_that("compute() on a grouped query returns a Table with groups in
metadata", {
```
##########
r/tests/testthat/test-dplyr-query.R:
##########
@@ -119,7 +119,7 @@ test_that("collect(as_data_frame=FALSE)", {
filter(int > 5) %>%
group_by(int) %>%
collect(as_data_frame = FALSE)
- expect_s3_class(b4, "arrow_dplyr_query")
Review Comment:
I think we need a test (and matching code change) that a Table with groups
in the `$metadata` prints that it is grouped. Currently, the print method does
not reveal that you have groupings, even though they get restored when you
`as.data.frame()`:
```
> tab <- arrow_table(mtcars)
> tab$metadata$r$attributes$.group_vars <- "cyl"
> tab
Table
32 rows x 11 columns
$mpg <double>
$cyl <double>
$disp <double>
$hp <double>
$drat <double>
$wt <double>
$qsec <double>
$vs <double>
$am <double>
$gear <double>
$carb <double>
See $metadata for additional Schema metadata
```
I wonder what else is lost or concealed by this change. This feature of
reading groups from `$metadata$r$attributes$.group_vars` was a bit of a hack
and not something we're generally relying on, so I don't think it's a simple
change. Or at least, it's not something we can just rely that the test suite
already covers everything we care about. (Search for ".group_vars" in the code,
it's not touched in many places, and only in one test.)
Not a reason not to do it, just need to be thorough.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]