nealrichardson commented on a change in pull request #12316: URL: https://github.com/apache/arrow/pull/12316#discussion_r806277566
########## File path: r/R/dataset-write.R ########## @@ -116,25 +116,40 @@ write_dataset <- function(dataset, if (inherits(dataset, "arrow_dplyr_query")) { # partitioning vars need to be in the `select` schema dataset <- ensure_group_vars(dataset) - } else if (inherits(dataset, "grouped_df")) { - force(partitioning) - # Drop the grouping metadata before writing; we've already consumed it - # now to construct `partitioning` and don't want it in the metadata$r - dataset <- dplyr::ungroup(dataset) + } else { + if (inherits(dataset, "grouped_df")) { + force(partitioning) + # Drop the grouping metadata before writing; we've already consumed it + # now to construct `partitioning` and don't want it in the metadata$r + dataset <- dplyr::ungroup(dataset) + } + dataset <- tryCatch( + as_adq(dataset), + error = function(e) { + stop("'dataset' must be a Dataset, RecordBatch, Table, arrow_dplyr_query, or data.frame, not ", deparse(class(dataset)), call. = FALSE) + } + ) } - scanner <- Scanner$create(dataset) + plan <- ExecPlan$create() + final_node <- plan$Build(dataset) + # TODO: warn/error if there is sorting/top_k? or just compute? (this needs test) Review comment: Cool. TopK is a separate issue--it's another feature only handled in a sink node. I'll handle it here by evaluating the query and then doing a new ExecPlan to write the resulting Table. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org