paleolimbot commented on issue #14732:
URL: https://github.com/apache/arrow/issues/14732#issuecomment-1327511328

   Thanks for posting! I think this is a situation where the source of the 
dplyr query is *another* dplyr query:
   
   ``` r
   library(arrow, warn.conflicts = FALSE)
   #> Some features are not enabled in this build of Arrow. Run `arrow_info()` 
for more information.
   library(dplyr, warn.conflicts = FALSE)
   ds_file <- file.path(tempdir(), "mtcars")
   
   write_dataset(mtcars |> select(mpg, cyl), ds_file)
   ds <- open_dataset(ds_file)
   
   # filter is printed | EXPECTED
   q <- ds |> filter(mpg > 25)
   class(q$.data)
   #> [1] "FileSystemDataset" "Dataset"           "ArrowObject"      
   #> [4] "R6"
   
   q <- ds |> 
     filter(mpg > 25) |> 
     summarise(mpg = mean(mpg))
   class(q$.data)
   #> [1] "arrow_dplyr_query"
   
   print(q$.data)
   #> FileSystemDataset (query)
   #> mpg: double
   #> 
   #> * Aggregations:
   #> mpg: mean(mpg)
   #> * Filter: (mpg > 25)
   #> See $.data for the source Arrow object
   print(q)
   #> FileSystemDataset (query)
   #> mpg: double
   #> 
   #> See $.data for the source Arrow object
   ```
   
   <sup>Created on 2022-11-25 with [reprex 
v2.0.2](https://reprex.tidyverse.org)</sup>
   
   We probably just need to make sure to `print()` the `.data` recursively if 
`inherits(.data, "arrow_dplyr_query")` (or else some of the steps may be 
hidden, as you discovered!
   
   The print method is here: 
https://github.com/apache/arrow/blob/2078af7c710d688c14313b9486b99c981550a7b7/r/R/dplyr.R#L116-L168


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to