[GitHub] [arrow] jonkeane commented on a change in pull request #9748: ARROW-11729: [R] Add examples to datasets documentation

GitBox Mon, 19 Apr 2021 12:31:04 -0700


jonkeane commented on a change in pull request #9748:
URL: https://github.com/apache/arrow/pull/9748#discussion_r616122141




##########
File path: r/R/dataset-write.R
##########
@@ -54,6 +54,49 @@
 #' - `null_fallback`: character to be used in place of missing values (`NA` or
 #' `NULL`) when using Hive-style partitioning. See [hive_partition()].
 #' @return The input `dataset`, invisibly
+#' @examples
+#' # We start by creating temporary directories.
+#' one_part_dir <- tempfile()
+#' two_part_dir <- tempfile()
+#' 
+#' # We can write datasets partitioned by the values in a column (here: "cyl").
+#' # This creates a structure of the form cyl=X/part-Z.parquet.
+#' write_dataset(mtcars, one_part_dir, partitioning = "cyl")
+#'
+#' # We can also partition by the values in multiple columns.
+#' # This creates a structure of the form cyl=X/gear=Y/part-Z.parquet.
+#' write_dataset(mtcars, two_part_dir, partitioning = c("cyl", "gear"))
+#'
+#' # In the two previous examples we would have:
+#' # X = \{4,6,8\}, the number of cylinders.
+#' # Y = \{3,4,5\}, the number of forward gears.
+#' # Z = \{0,1,2\}, the number of saved fragments, starting from 0.
+#' 
+#' # And we can check what we just saved.
+#' list.files(one_part_dir, recursive = TRUE)
+#' list.files(two_part_dir, recursive = TRUE)
+#'
+#' # We can do the same as the previous call with two variables combining both
+#' # arrow and dplyr, so the example is just a repetition with different steps.
+#' # We shall do it exactly as above and then with a slight change to the
+#' # output.
+#'
+#' if(requireNamespace("dplyr", quietly = TRUE)) {
+#'  d <- mtcars %>% group_by(cyl, gear)
+#'
+#'  # Write a structure X/Y/part-Z.parquet.
+#'  two_part_dir_2 <- tempfile()
+#'  d %>% write_dataset(two_part_dir_2)

Review comment:
       Does the comment on line 87 match the output here? I would think this 
`write_dataset()` call would use hive partition style (since that's the default)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] jonkeane commented on a change in pull request #9748: ARROW-11729: [R] Add examples to datasets documentation

Reply via email to