[GitHub] [arrow] nealrichardson commented on a change in pull request #12316: ARROW-15517: [R] Use WriteNode in write_dataset()

GitBox Mon, 14 Feb 2022 14:06:58 -0800


nealrichardson commented on a change in pull request #12316:
URL: https://github.com/apache/arrow/pull/12316#discussion_r806277566




##########
File path: r/R/dataset-write.R
##########
@@ -116,25 +116,40 @@ write_dataset <- function(dataset,
   if (inherits(dataset, "arrow_dplyr_query")) {
     # partitioning vars need to be in the `select` schema
     dataset <- ensure_group_vars(dataset)
-  } else if (inherits(dataset, "grouped_df")) {
-    force(partitioning)
-    # Drop the grouping metadata before writing; we've already consumed it
-    # now to construct `partitioning` and don't want it in the metadata$r
-    dataset <- dplyr::ungroup(dataset)
+  } else {
+    if (inherits(dataset, "grouped_df")) {
+      force(partitioning)
+      # Drop the grouping metadata before writing; we've already consumed it
+      # now to construct `partitioning` and don't want it in the metadata$r
+      dataset <- dplyr::ungroup(dataset)
+    }
+    dataset <- tryCatch(
+      as_adq(dataset),
+      error = function(e) {
+        stop("'dataset' must be a Dataset, RecordBatch, Table, 
arrow_dplyr_query, or data.frame, not ", deparse(class(dataset)), call. = FALSE)
+      }
+    )
   }
 
-  scanner <- Scanner$create(dataset)
+  plan <- ExecPlan$create()
+  final_node <- plan$Build(dataset)
+  # TODO: warn/error if there is sorting/top_k? or just compute? (this needs 
test)

Review comment:
       Cool. TopK is a separate issue--it's another feature only handled in a 
sink node. I'll handle it here by evaluating the query and then doing a new 
ExecPlan to write the resulting Table. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] nealrichardson commented on a change in pull request #12316: ARROW-15517: [R] Use WriteNode in write_dataset()

Reply via email to