nealrichardson commented on a change in pull request #11373:
URL: https://github.com/apache/arrow/pull/11373#discussion_r725247172
##########
File path: r/R/query-engine.R
##########
@@ -175,9 +183,20 @@ ExecPlan <- R6Class("ExecPlan",
temp_columns = names(.data$temp_columns)
)
}
+
+ # This is only safe because we are going to evaluate queries that end
+ # with head/tail first, then evaluate any subsequent query as a new query
+ if (!is.null(.data$head)) {
+ node$head <- .data$head
+ }
+ if (!is.null(.data$tail)) {
+ node$tail <- .data$tail
+ }
+
node
},
Run = function(node) {
+ # TODO: pass head/tail to ExecPlan_run
Review comment:
```suggestion
# TODO (ARROW-12763): pass head/tail to ExecPlan_run
```
##########
File path: r/R/dplyr-collect.R
##########
@@ -19,6 +19,20 @@
# The following S3 methods are registered on load if dplyr is present
collect.arrow_dplyr_query <- function(x, as_data_frame = TRUE, ...) {
+ # head and tail are not ExecNodes, at best we can handle them via sink node
+ # so if there are any steps done after head/tail, we need to
+ # evaluate the query up to then and then do a new query for the rest
+ if (is_collapsed(x) && has_head_tail(x$.data)) {
+ x$.data <- as_adq(dplyr::compute(x$.data))
+ }
+ # TODO: if there are no aggregations or joins or sorting, (and there is
Review comment:
I will do this in this PR (or maybe not necessary if I can use the
Scanner interface in do_exec_plan)
##########
File path: r/R/dplyr.R
##########
@@ -162,22 +162,36 @@ as.data.frame.arrow_dplyr_query <- function(x, row.names
= NULL, optional = FALS
#' @export
head.arrow_dplyr_query <- function(x, n = 6L, ...) {
- # TODO (ARROW-13893): refactor
- out <- head.Dataset(x, n, ...)
- restore_dplyr_features(out, x)
+ # TODO: test that nrow on collapsed query with agg/join/head
Review comment:
Will do now
##########
File path: r/R/query-engine.R
##########
@@ -20,6 +20,14 @@ do_exec_plan <- function(.data) {
final_node <- plan$Build(.data)
tab <- plan$Run(final_node)$read_table()
+ # TODO: can we move all of this logic into plan$Run, making a "source" node?
+ if (!is.null(final_node$head)) {
+ # TODO: implement head off of the RecordBatchReader so you can bail early
Review comment:
I will do this one here, will try going through
ScannerBuilder::FromRecordBatchReader and use the existing head/tail methods
there so as not to reimplement that logic. Assuming that works, I may be able
to pull this into plan$Run (and do Scanner::ToRecordBatchReader on the other
side)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]