jonkeane commented on a change in pull request #10992:
URL: https://github.com/apache/arrow/pull/10992#discussion_r698715025



##########
File path: r/R/query-engine.R
##########
@@ -42,11 +55,73 @@ ExecPlan <- R6Class("ExecPlan",
       }
       # ScanNode needs the filter to do predicate pushdown and skip partitions,
       # and it needs to know which fields to materialize (and which are 
unnecessary)
-      ExecNode_Scan(self, dataset, filter, colnames)
+      ExecNode_Scan(self, dataset, filter, colnames %||% character(0))
+    },
+    Build = function(.data) {
+      # This method takes an arrow_dplyr_query and chains together the
+      # ExecNodes that they produce. It does not evaluate them--that is Run().
+      group_vars <- dplyr::group_vars(.data)
+      grouped <- length(group_vars) > 0
+
+      # Collect the target names first because we have to add back the group 
vars
+      target_names <- names(.data)
+      .data <- ensure_group_vars(.data)
+      .data <- ensure_arrange_vars(.data) # this sets .data$temp_columns
+
+      node <- self$Scan(.data)
+      # ARROW-13498: Even though Scan takes the filter, apparently we have to 
do it again
+      if (inherits(.data$filtered_rows, "Expression")) {
+        node <- node$Filter(.data$filtered_rows)
+      }
+      # If any columns are derived we need to Project (otherwise this may be 
no-op)
+      node <- node$Project(c(.data$selected_columns, .data$temp_columns))
+
+      if (length(.data$aggregations)) {
+        if (grouped) {
+          # We need to prefix all of the aggregation function names with 
"hash_"
+          .data$aggregations <- lapply(.data$aggregations, function(x) {
+            x[["fun"]] <- paste0("hash_", x[["fun"]])
+            x
+          })
+        }
+
+        node <- node$Aggregate(
+          options = .data$aggregations,
+          target_names = target_names,
+          out_field_names = names(.data$aggregations),
+          key_names = group_vars
+        )
+
+        if (grouped) {
+          # The result will have result columns first then the grouping cols.
+          # dplyr orders group cols first, so adapt the result to meet that 
expectation.

Review comment:
       Oh, nevermind I see lines 114-117 do that sorting. I should have read 
lower and not stopped at the comment 🤦 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to