nealrichardson commented on a change in pull request #9591: URL: https://github.com/apache/arrow/pull/9591#discussion_r584147300
########## File path: r/R/dataset-write.R ########## @@ -53,6 +53,17 @@ #' - `codec`: A [Codec] which will be used to compress body buffers of written #' files. Default (NULL) will not compress body buffers. #' @return The input `dataset`, invisibly +#' @examples +#' \donttest{ +#' # we can group by cyl, cyl and gear or even more variables Review comment: ```suggestion #' # We can partition by one more variables ``` ########## File path: r/R/dataset-write.R ########## @@ -53,6 +53,17 @@ #' - `codec`: A [Codec] which will be used to compress body buffers of written #' files. Default (NULL) will not compress body buffers. #' @return The input `dataset`, invisibly +#' @examples +#' \donttest{ +#' # we can group by cyl, cyl and gear or even more variables +#' write_dataset(mtcars, tempdir(), "feather", partitioning = "cyl")) +#' write_dataset(mtcars, tempdir(), "feather", partitioning = c("cyl", "gear")) +#' +#' # the latter example is the same as the following dplyr chained statement +#' # mtcars %>% +#' # group_by(cyl, gear) %>% +#' # write_dataset(dout2, "feather") +#' } Review comment: I think it would be good to have one more example just like these but with `hive_style = FALSE` and print its directory contents so that the resulting difference between the two ways is clear. ########## File path: r/R/dataset-write.R ########## @@ -53,6 +53,17 @@ #' - `codec`: A [Codec] which will be used to compress body buffers of written #' files. Default (NULL) will not compress body buffers. #' @return The input `dataset`, invisibly +#' @examples +#' \donttest{ +#' # we can group by cyl, cyl and gear or even more variables +#' write_dataset(mtcars, tempdir(), "feather", partitioning = "cyl")) Review comment: I know @jonkeane directed you otherwise, but I think it's useful to show the contents of the directory you wrote to in order to demonstrate what partitioning does, so I would define `one_part_dir <- tempfile()` (note `tempfile()` not `tempdir()` is what you want, see `?tempfile`) and `two_part_dir <- tempfile()` and then do `dir(., recursive = TRUE)` for each dir after. Also for simplicity, drop "feather", just take the default (parquet). ########## File path: r/R/dataset-write.R ########## @@ -53,6 +53,17 @@ #' - `codec`: A [Codec] which will be used to compress body buffers of written #' files. Default (NULL) will not compress body buffers. #' @return The input `dataset`, invisibly +#' @examples +#' \donttest{ +#' # we can group by cyl, cyl and gear or even more variables +#' write_dataset(mtcars, tempdir(), "feather", partitioning = "cyl")) +#' write_dataset(mtcars, tempdir(), "feather", partitioning = c("cyl", "gear")) +#' +#' # the latter example is the same as the following dplyr chained statement +#' # mtcars %>% Review comment: Don't leave this commented out, let it execute. You'll need to wrap in something that loads `dplyr` and doesn't error if the package is not present because it is a Suggested dependency, not a hard one. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org