[GitHub] [arrow] nealrichardson commented on a change in pull request #9591: ARROW-11729: [R] Add examples to the datasets documentation

GitBox Sat, 27 Feb 2021 08:02:01 -0800


nealrichardson commented on a change in pull request #9591:
URL: https://github.com/apache/arrow/pull/9591#discussion_r584147300




##########
File path: r/R/dataset-write.R
##########
@@ -53,6 +53,17 @@
 #' - `codec`: A [Codec] which will be used to compress body buffers of written
 #'   files. Default (NULL) will not compress body buffers.
 #' @return The input `dataset`, invisibly
+#' @examples
+#' \donttest{
+#' # we can group by cyl, cyl and gear or even more variables

Review comment:
       ```suggestion
   #' # We can partition by one more variables
   ```

##########
File path: r/R/dataset-write.R
##########
@@ -53,6 +53,17 @@
 #' - `codec`: A [Codec] which will be used to compress body buffers of written
 #'   files. Default (NULL) will not compress body buffers.
 #' @return The input `dataset`, invisibly
+#' @examples
+#' \donttest{
+#' # we can group by cyl, cyl and gear or even more variables
+#' write_dataset(mtcars, tempdir(), "feather", partitioning = "cyl"))
+#' write_dataset(mtcars, tempdir(), "feather", partitioning = c("cyl", "gear"))
+#'
+#' # the latter example is the same as the following dplyr chained statement
+#' # mtcars %>%
+#' #  group_by(cyl, gear) %>%
+#' #  write_dataset(dout2, "feather")
+#' }

Review comment:
       I think it would be good to have one more example just like these but 
with `hive_style = FALSE` and print its directory contents so that the 
resulting difference between the two ways is clear.

##########
File path: r/R/dataset-write.R
##########
@@ -53,6 +53,17 @@
 #' - `codec`: A [Codec] which will be used to compress body buffers of written
 #'   files. Default (NULL) will not compress body buffers.
 #' @return The input `dataset`, invisibly
+#' @examples
+#' \donttest{
+#' # we can group by cyl, cyl and gear or even more variables
+#' write_dataset(mtcars, tempdir(), "feather", partitioning = "cyl"))

Review comment:
       I know @jonkeane directed you otherwise, but I think it's useful to show 
the contents of the directory you wrote to in order to demonstrate what 
partitioning does, so I would define `one_part_dir <- tempfile()` (note 
`tempfile()` not `tempdir()` is what you want, see `?tempfile`) and 
`two_part_dir <- tempfile()` and then do `dir(., recursive = TRUE)` for each 
dir after.
   
   Also for simplicity, drop "feather", just take the default (parquet).

##########
File path: r/R/dataset-write.R
##########
@@ -53,6 +53,17 @@
 #' - `codec`: A [Codec] which will be used to compress body buffers of written
 #'   files. Default (NULL) will not compress body buffers.
 #' @return The input `dataset`, invisibly
+#' @examples
+#' \donttest{
+#' # we can group by cyl, cyl and gear or even more variables
+#' write_dataset(mtcars, tempdir(), "feather", partitioning = "cyl"))
+#' write_dataset(mtcars, tempdir(), "feather", partitioning = c("cyl", "gear"))
+#'
+#' # the latter example is the same as the following dplyr chained statement
+#' # mtcars %>%

Review comment:
       Don't leave this commented out, let it execute. You'll need to wrap in 
something that loads `dplyr` and doesn't error if the package is not present 
because it is a Suggested dependency, not a hard one.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] nealrichardson commented on a change in pull request #9591: ARROW-11729: [R] Add examples to the datasets documentation

Reply via email to