paleolimbot commented on code in PR #13541:
URL: https://github.com/apache/arrow/pull/13541#discussion_r923720163


##########
r/tests/testthat/test-dplyr-query.R:
##########
@@ -433,3 +433,343 @@ test_that("query_can_stream()", {
       query_can_stream()
   )
 })
+
+test_that("show_exec_plan(), show_query() and explain()", {
+  # minimal test - this fails if we don't coerce the input to 
`show_exec_plan()`
+  # to be an `arrow_dplyr_query`
+  expect_output(

Review Comment:
   Can you use `expect_snapshot()` for these? Or is the output not the same 
across platforms?



##########
r/tests/testthat/test-dataset-dplyr.R:
##########
@@ -340,3 +340,82 @@ test_that("dplyr method not implemented messages", {
     fixed = TRUE
   )
 })
+
+test_that("show_exec_plan(), show_query() and explain() with datasets", {
+  ds <- open_dataset(dataset_dir, partitioning = schema(part = uint8()))

Review Comment:
   See below - I think you should try `expect_snapshot()` to make these less 
verbose.
   
   I also think that you don't need this many tests - just enough to cover the 
code you've added. If you extract `ExecPlan_prepare()`, which I think you 
should, then you only need one or two tests per verb (in my opinion). To me as 
a reviewer, the extra tests imply extra code paths which may be misleading.



##########
r/R/dplyr.R:
##########
@@ -219,6 +219,45 @@ tail.arrow_dplyr_query <- function(x, n = 6L, ...) {
   x
 }
 
+#' Show the details of an Arrow Execution Plan
+#'
+#' This is a function which gives more details about the logical query plan
+#' that will be executed when evaluating an `arrow_dplyr_query` object.
+#' It calls the C++ `ExecPlan` object's print method.
+#' Functionally, it is similar to `dplyr::explain()`.
+#'
+#' @param x an `arrow_dplyr_query` to print the `ExecPlan` for.
+#'
+#' @return `x`, invisibly.
+#' @export
+#'
+#' @examplesIf arrow_with_dataset() & requireNamespace("dplyr", quietly = TRUE)

Review Comment:
   ```suggestion
   #' @examplesIf arrow_with_dataset() && requireNamespace("dplyr", quietly = 
TRUE)
   ```



##########
r/R/query-engine.R:
##########
@@ -259,9 +260,40 @@ ExecPlan <- R6Class("ExecPlan",
         ...
       )
     },
+    ToString = function() ExecPlan_ToString(self),

Review Comment:
   Is there any circumstance in which `ToString()` should return something that 
isn't `ToStringWithSink()`? I would personally expect `ToString()` to always 
return the exec plan that will be executed, in other words what you have above:
   
   ```r
     adq <- as_adq(x)
     plan <- ExecPlan$create()
     final_node <- plan$Build(adq)
     plan$ToStringWithSink(final_node)
   ```
   
   You could also remove this function since I don't think the output gets used 
anywhere (unless I'm missing some tests).



##########
r/R/query-engine.R:
##########
@@ -259,9 +260,40 @@ ExecPlan <- R6Class("ExecPlan",
         ...
       )
     },
+    ToString = function() ExecPlan_ToString(self),
+    # SinkNodes (involved in arrange and/or head/tail operations) are created 
in
+    # ExecPlan_run and are not captured by the regulat print method. We take a

Review Comment:
   ```suggestion
       # ExecPlan_run and are not captured by the regular print method. We take 
a
   ```



##########
r/src/compute-exec.cpp:
##########
@@ -125,6 +125,46 @@ std::shared_ptr<arrow::Schema> ExecNode_output_schema(
   return node->output_schema();
 }
 
+// [[arrow::export]]
+std::string ExecPlan_ToString(const std::shared_ptr<compute::ExecPlan>& plan) {
+  return plan->ToString();
+}

Review Comment:
   See above comment - I think you should remove this since the output would be 
misleading (alternatively, provide a way to access this value in 
`show_exec_plan()` and document why it is useful).



##########
r/src/compute-exec.cpp:
##########
@@ -125,6 +125,46 @@ std::shared_ptr<arrow::Schema> ExecNode_output_schema(
   return node->output_schema();
 }
 
+// [[arrow::export]]
+std::string ExecPlan_ToString(const std::shared_ptr<compute::ExecPlan>& plan) {
+  return plan->ToString();
+}
+
+// [[arrow::export]]
+std::string ExecPlan_ToStringWithSink(
+    const std::shared_ptr<compute::ExecPlan>& plan,
+    const std::shared_ptr<compute::ExecNode>& final_node, cpp11::list 
sort_options,
+    int64_t head = -1) {

Review Comment:
   There's some duplication here that I'm worried somebody might forget when 
updating ExecPlan_run. One strategy is to add a comment to ExecPlan_run (and 
here) noting that the two implementations must be kept in sync. Probably better 
is to copy part of what I started in ARROW-16444 ( 
https://github.com/paleolimbot/arrow/blob/r-udf/r/src/compute-exec.cpp#L60-L91 
) that extracts the common functionality into one function.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to