jonkeane commented on code in PR #13541:
URL: https://github.com/apache/arrow/pull/13541#discussion_r918360302
##########
r/tests/testthat/test-dplyr-query.R:
##########
@@ -293,3 +293,23 @@ test_that("No duplicate field names are allowed in an
arrow_dplyr_query", {
)
)
})
+
+test_that("show_exec_plan()", {
+ expect_snapshot(
+ tbl %>%
+ arrow_table() %>%
+ filter(dbl > 2, chr != "e") %>%
+ select(chr, int, lgl) %>%
+ mutate(int_plus_ten = int + 10) %>%
+ show_exec_plan()
+ )
+
+ expect_snapshot(
+ tbl %>%
+ record_batch() %>%
+ filter(dbl > 2, chr != "e") %>%
+ select(chr, int, lgl) %>%
+ mutate(int_plus_ten = int + 10) %>%
+ show_exec_plan()
+ )
+})
Review Comment:
It would be nice to have tests for the following as well (they might be
somewhere overkill, see the paragraph below this, but those go through enough
other machinery in R that it would be good to confirm that they don't fail):
* a `summarise()`
* a join (it could even be a self join)
* a dataset querying multiple files (this one might be better in the
datasets test file since the setup is already done there...)
You also might consider asserting specific limited things about the output
but not the whole output in a snapshot (e.g. does it include the columns
referenced, does it have the boiler plate we expect won't change like `ExecPlan
.* nodes`). The content of the output is more or less directly from C++, so if
that were to change we would need to update these tests even if the C++ <-> R
code hasn't changed at all.
##########
r/R/dplyr.R:
##########
@@ -219,6 +219,31 @@ tail.arrow_dplyr_query <- function(x, n = 6L, ...) {
x
}
+#' Show the details of an Arrow ExecPlan
+#'
+#' This is a function which gives more details about the `ExecPlan` of an
+#' `arrow_dplyr_query` object. It is similar to `dplyr::show_query()`.
+#'
+#' @param x an `arrow_dplyr_query` to print the ExecPlan for.
+#'
+#' @return The argument, invisibly.
+#' @export
+#'
+#' @examples
Review Comment:
We probably want to mark this as:
```suggestion
#' @examplesIf arrow_with_dataset() & requireNamespace("dplyr", quietly =
TRUE)
```
But definitely the {dplyr} bit, since we don't depend on {dplyr} (only
suggests) and CRAN will be upset if the example fails if {dplyr} isn't
available.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]