nealrichardson commented on a change in pull request #10141:
URL: https://github.com/apache/arrow/pull/10141#discussion_r625288659
##########
File path: r/R/csv.R
##########
@@ -585,3 +602,52 @@ readr_to_csv_convert_options <- function(na,
include_columns = include_columns
)
}
+
+#' Write CSV file to disk
+#'
+#' @param x `data.frame`, [RecordBatch], or [Table]
+#' @param sink A string file path, URI, or [OutputStream], or path in a file
+#' system (`SubTreeFileSystem`)
+#' @param include_header Whether to write an initial header line with column
names
+#' @param batch_size Maximum number of rows processed at a time. Default is
1024.
+#'
+#' @return The input `x`, invisibly. Note that if `sink` is an [OutputStream],
+#' the stream will be left open.
+#' @export
+#' @examples
+#' \donttest{
+#' tf <- tempfile()
+#' on.exit(unlink(tf))
+#' write_csv_arrow(mtcars, tf)
+#' }
+#' @include arrow-package.R
+write_csv_arrow <- function(x,
+ sink,
+ include_header = TRUE,
+ batch_size = 1024L) {
+ # Handle and validate options before touching data
+ batch_size <- as.integer(batch_size)
+ assert_that(batch_size > 0)
Review comment:
Let's move this to CsvWriteOptions$create(). I like to keep the input
validation as close to the C++ call site as possible, just so that if someone
uses the lower-level bindings, they still have protection from segfaults.
```suggestion
```
##########
File path: r/R/csv.R
##########
@@ -408,6 +413,18 @@ CsvReadOptions$create <- function(use_threads =
option_use_threads(),
)
}
+#' @rdname CsvReadOptions
+#' @export
+CsvWriteOptions <- R6Class("CsvWriteOptions", inherit = ArrowObject)
+CsvWriteOptions$create <- function(include_header = TRUE, batch_size = 1024L){
+ csv___WriteOptions__initialize(
+ list(
+ include_header = include_header,
+ batch_size = batch_size
+ )
+ )
Review comment:
`rlang::is_integerish` can handle a few other validation cases we want
(length 1, not missing):
```
> rlang::is_integerish(1:4)
[1] TRUE
> rlang::is_integerish(1:4, n = 1)
[1] FALSE
> rlang::is_integerish(NA_integer_)
[1] TRUE
> rlang::is_integerish(NA_integer_, finite = TRUE)
[1] FALSE
```
```suggestion
assert_that(is_integerish(batch_size, n = 1, finite = TRUE), batch_size >
0)
csv___WriteOptions__initialize(
list(
include_header = include_header,
batch_size = as.integer(batch_size)
)
)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]