[GitHub] [arrow] jonkeane commented on a change in pull request #9748: ARROW-11729: [R] Add examples to datasets documentation

GitBox Fri, 27 Aug 2021 12:17:07 -0700


jonkeane commented on a change in pull request #9748:
URL: https://github.com/apache/arrow/pull/9748#discussion_r697666647




##########
File path: r/R/dataset-write.R
##########
@@ -54,6 +54,44 @@
 #' - `null_fallback`: character to be used in place of missing values (`NA` or
 #' `NULL`) when using Hive-style partitioning. See [hive_partition()].
 #' @return The input `dataset`, invisibly
+#' @examplesIf arrow_with_dataset() & arrow_with_parquet() & 
requireNamespace("dplyr", quietly = TRUE)
+#' # You can write datasets partitioned by the values in a column (here: 
"cyl").
+#' # This creates a structure of the form cyl=X/part-Z.parquet.
+#' one_level_tree <- tempfile()
+#' write_dataset(mtcars, one_level_tree, partitioning = "cyl")
+#' list.files(one_level_tree, recursive = TRUE)
+#'
+#' # You can also partition by the values in multiple columns
+#' # (here: "cyl" and "gear").
+#' # This creates a structure of the form cyl=X/gear=Y/part-Z.parquet.
+#' two_levels_tree <- tempfile()
+#' write_dataset(mtcars, two_levels_tree, partitioning = c("cyl", "gear"))
+#' list.files(two_levels_tree, recursive = TRUE)
+#'
+#' # In the two previous examples we would have:
+#' # X = \{4,6,8\}, the number of cylinders.
+#' # Y = \{3,4,5\}, the number of forward gears.
+#' # Z = \{0,1,2\}, the number of saved parts, starting from 0.
+#'
+#' # You can obtain the same result as as the previous examples using arrow 
with
+#' # a dplyr pipeline:
+#'
+#' d <- group_by(mtcars, cyl, gear)

Review comment:
       Could we move this line down to between 84 and 85? Also I think it would 
be better if it were all in a pipeline (since that's what we're telling people 
we are using here):
   
   ```
   mtcars %>%
     group_by(cyl, gear) %>%
     write_dataset(two_levels_tree_2)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jonkeane commented on a change in pull request #9748: ARROW-11729: [R] Add examples to datasets documentation

Reply via email to