wjones127 commented on code in PR #13563: URL: https://github.com/apache/arrow/pull/13563#discussion_r918118858
########## r/R/dplyr.R: ########## @@ -276,13 +278,48 @@ source_data <- function(x) { } } -is_collapsed <- function(x) inherits(x$.data, "arrow_dplyr_query") +all_sources <- function(x) { + if (is.null(x)) { + x + } else if (!inherits(x, "arrow_dplyr_query")) { + list(x) + } else { + c( + all_sources(x$.data), + all_sources(x$join$right_data), + all_sources(x$union_all$right_data) + ) + } +} -has_aggregation <- function(x) { - # TODO: update with joins (check right side data too) - !is.null(x$aggregations) || (is_collapsed(x) && has_aggregation(x$.data)) +query_can_stream <- function(x) { Review Comment: > We could build an ExecPlan, but it wouldn't tell us anything about how it would perform, would it? I'm not super close to the ExecPlan code, but I thought they were composed of a graph of nodes that could be traversed and analyzed, just like our `arrow_dplyr_query` structure. Am I wrong on that? > I'm trying to detect cases where I can just take head() of the data without having to scan an entire dataset. I was just thinking that having such a method on `ExecPlan` would be useful in general. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org