[GitHub] [arrow] jonkeane commented on a change in pull request #9591: ARROW-11729: [R] Add examples to the datasets documentation

GitBox Fri, 26 Feb 2021 15:17:24 -0800


jonkeane commented on a change in pull request #9591:
URL: https://github.com/apache/arrow/pull/9591#discussion_r583975843




##########
File path: r/R/dataset-write.R
##########
@@ -53,6 +53,34 @@
 #' - `codec`: A [Codec] which will be used to compress body buffers of written
 #'   files. Default (NULL) will not compress body buffers.
 #' @return The input `dataset`, invisibly
+#' @examples
+#' dout1 <- paste0(tempdir(), "/direct")
+#' dout2 <- paste0(tempdir(), "/group_by")
+#'
+#' # partitioning ----
+#' write_dataset(mtcars, dout1, "feather",
+#'  partitioning = "cyl",
+#'  basename_template = "{i}.feather",
+#'  hive_style = F)

Review comment:
       I wonder if we should leave off the `basename_template` and `hive_style` 
arguments here for clarity — the main focus here is on how the partitioning is 
defined and those might be more of a distraction than a help.

##########
File path: r/R/dataset-write.R
##########
@@ -53,6 +53,34 @@
 #' - `codec`: A [Codec] which will be used to compress body buffers of written
 #'   files. Default (NULL) will not compress body buffers.
 #' @return The input `dataset`, invisibly
+#' @examples
+#' dout1 <- paste0(tempdir(), "/direct")

Review comment:
       I suspect we will want to wrap this in `\donttest{}` since there is a 
possibility this might not fully function on some platforms. Ideally we 
wouldn't do that, but I suspect we might have to here.

##########
File path: r/R/dataset-write.R
##########
@@ -53,6 +53,34 @@
 #' - `codec`: A [Codec] which will be used to compress body buffers of written
 #'   files. Default (NULL) will not compress body buffers.
 #' @return The input `dataset`, invisibly
+#' @examples
+#' dout1 <- paste0(tempdir(), "/direct")
+#' dout2 <- paste0(tempdir(), "/group_by")
+#'
+#' # partitioning ----
+#' write_dataset(mtcars, dout1, "feather",
+#'  partitioning = "cyl",
+#'  basename_template = "{i}.feather",
+#'  hive_style = F)
+#'
+#' # group_by (same result as above) ----
+#' library(dplyr)
+#'
+#' mtcars %>%
+#'  group_by(cyl) %>%
+#'  write_dataset(dout2, "feather",
+#'   basename_template = "{i}.feather",
+#'   hive_style = F)
+#'
+#' # compare ----
+#' finp1 <- list.files(dout1, full.names = T, recursive = T,
+#'  pattern = "feather")
+#'
+#' finp2 <- list.files(dout2, full.names = T, recursive = T,
+#'  pattern = "feather")
+#'
+#' finp1
+#' finp2

Review comment:
       I think we can use comments to say the two are equivalent and we don't 
need to prove it with this code down here. That'll make the example a little 
tighter and clearer.

##########
File path: r/R/dataset-write.R
##########
@@ -53,6 +53,34 @@
 #' - `codec`: A [Codec] which will be used to compress body buffers of written
 #'   files. Default (NULL) will not compress body buffers.
 #' @return The input `dataset`, invisibly
+#' @examples
+#' dout1 <- paste0(tempdir(), "/direct")
+#' dout2 <- paste0(tempdir(), "/group_by")
+#'
+#' # partitioning ----
+#' write_dataset(mtcars, dout1, "feather",
+#'  partitioning = "cyl",

Review comment:
       It might be nice to have a two-variable partitioning scheme here instead 
of just one, that way it shows off that this can be any number of column names 
/ groups in group_by()




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jonkeane commented on a change in pull request #9591: ARROW-11729: [R] Add examples to the datasets documentation

Reply via email to