nealrichardson commented on code in PR #33614:
URL: https://github.com/apache/arrow/pull/33614#discussion_r1068520934


##########
r/R/dataset.R:
##########
@@ -228,6 +228,86 @@ open_dataset <- function(sources,
   )
 }
 
+#' Open a multi-file dataset of CSVs
+#'
+#' A wrapper around [open_dataset] which explicitly includes parameters 
specific to reading CSV files
+#'
+#' @inheritParams open_dataset
+#' @inheritParams read_csv_arrow
+#'

Review Comment:
   It might be helpful to include an example here that shows why you would want 
this function, i.e. what it simplifies. (A reprex of that would also be nice to 
include on the PR description to demonstrate why this needs to exist.)
   
   Other things worth considering:
   
   * Working this function into a vignette
   * xrefs from the `open_dataset` docs (`@seealso`; an inline reference in the 
"format" argument; etc.) and the `read_delim_arrow` docs.



##########
r/tests/testthat/test-dataset-csv.R:
##########
@@ -476,3 +476,32 @@ test_that("CSV reading/parsing/convert options can be 
passed in as lists", {
 
   expect_equal(ds1, ds2)
 })
+
+test_that("open_csv_dataset params passed through to open_dataset", {
+  ds <- open_csv_dataset(csv_dir, partitioning = "part")
+  expect_r6_class(ds$format, "CsvFileFormat")
+  expect_r6_class(ds$filesystem, "LocalFileSystem")
+  expect_identical(names(ds), c(names(df1), "part"))
+  expect_identical(dim(ds), c(20L, 7L))
+
+  # delim
+  ds_tsv <- open_csv_dataset(tsv_dir, delim = "\t", partitioning = "part")

Review Comment:
   This kinda begs the question: should there also be `open_tsv_dataset()`, 
`open_delim_dataset()`, etc.?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to