ianmcook commented on a change in pull request #9927:
URL: https://github.com/apache/arrow/pull/9927#discussion_r608988128
##########
File path: r/R/dplyr.R
##########
@@ -619,9 +619,23 @@ collect.arrow_dplyr_query <- function(x, as_data_frame =
TRUE, ...) {
restore_dplyr_features(tab, x)
}
}
-collect.ArrowTabular <- as.data.frame.ArrowTabular
+collect.ArrowTabular <- function(x, as_data_frame = TRUE, ...) {
+ if (as_data_frame) as.data.frame(x, ...) else x
+}
collect.Dataset <- function(x, ...) dplyr::collect(arrow_dplyr_query(x), ...)
+compute.arrow_dplyr_query <- function(x, ...) dplyr::collect(x, as_data_frame
= FALSE)
+compute.ArrowTabular <- function(x, ...) x
+compute.Dataset <- function(x, ...) {
Review comment:
I spent about an hour trying various different things here, such as
paring back what `restore_dplyr_features()` does and also factoring out the
compute code into a separate internal function and calling it from both
`collect` and `compute`, but this caused test failures. Investigating these
seems beyond the scope of what we're trying to achieve here and I think my time
would be better spent on other work. Can we leave this as is for now and open a
Jira for improvement later?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]