nealrichardson commented on a change in pull request #10888:
URL: https://github.com/apache/arrow/pull/10888#discussion_r697434894



##########
File path: r/R/dplyr-filter.R
##########
@@ -71,8 +71,14 @@ filter.Dataset <- filter.ArrowTabular <- 
filter.arrow_dplyr_query
 
 set_filters <- function(.data, expressions) {
   if (length(expressions)) {
-    # expressions is a list of Expressions. AND them together and set them on 
.data
-    new_filter <- Reduce("&", expressions)
+    if (is.list(expressions)) {
+      # expressions is a list of Expressions. AND them together and set them 
on .data
+      new_filter <- Reduce("&", expressions)
+    } else {
+      # expressions is an expression already

Review comment:
       Should we assert that it is an expression?

##########
File path: r/R/dataset-scan.R
##########
@@ -85,9 +85,26 @@ Scanner$create <- function(dataset,
       # To handle mutate() on Table/RecordBatch, we need to 
collect(as_data_frame=FALSE) now
       dataset <- dplyr::collect(dataset, as_data_frame = FALSE)
     }
+
+    proj <- c(dataset$selected_columns, dataset$temp_columns)
+    if (!is.null(projection)) {
+      # grab all of the projection elements that are characters to select 
columns
+      # TODO: should we check names and make sure we only project once per?

Review comment:
       why?

##########
File path: r/R/dataset-scan.R
##########
@@ -85,9 +85,26 @@ Scanner$create <- function(dataset,
       # To handle mutate() on Table/RecordBatch, we need to 
collect(as_data_frame=FALSE) now
       dataset <- dplyr::collect(dataset, as_data_frame = FALSE)
     }
+
+    proj <- c(dataset$selected_columns, dataset$temp_columns)
+    if (!is.null(projection)) {

Review comment:
       Can you update the docstrings above to explain what the accepted inputs 
and behaviors are? (I think it's not 100% up to date before this change too, 
but I want to be clear about what the expected inputs are (and I'm curious 
whether that will sound sensible when you write it out).)

##########
File path: r/tests/testthat/test-dataset.R
##########
@@ -1071,6 +1071,60 @@ test_that("Scanner$ToRecordBatchReader()", {
   )
 })
 
+test_that("Scanner$create() filter/projection pushdown", {
+  ds <- open_dataset(dataset_dir, partitioning = "part")
+
+  # the standard to compare all Scanner$create()s against
+  scan_one <- ds %>%
+    filter(int > 7 & dbl < 57) %>%
+    select(int, dbl, lgl) %>%
+    mutate(int_plus = int + 1, dbl_minus = dbl - 1) %>%
+    Scanner$create()
+
+  # add a column in projection
+  scan_two <- ds %>%
+    filter(int > 7 & dbl < 57) %>%
+    select(int, dbl, lgl) %>%
+    mutate(int_plus = int + 1) %>%
+    Scanner$create(projection = list(
+      "int", "dbl", "lgl", "int_plus",

Review comment:
       A combination of bare strings and named expressions? That seems odd.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to