nealrichardson commented on a change in pull request #9725:
URL: https://github.com/apache/arrow/pull/9725#discussion_r598811943
##########
File path: r/tests/testthat/test-dataset.R
##########
@@ -318,6 +357,36 @@ test_that("compressed CSV dataset", {
)
})
+test_that("CSV dataset options", {
+ dst_dir <- make_temp_dir()
+ dst_file <- file.path(dst_dir, "data.csv")
+ df <- tibble(chr = letters[1:10])
+ write.csv(df, dst_file, row.names = FALSE, quote = FALSE)
+
+ format <- FileFormat$create("csv", skip_rows = 1)
+ ds <- open_dataset(dst_dir, format = format)
+
+ expect_equivalent(
+ ds %>%
+ select(string = a) %>%
+ collect(),
+ df1[-1,] %>%
+ select(string = chr)
+ )
+
+ format <- FileFormat$create("csv", column_names = c("foo"))
+ ds <- open_dataset(dst_dir, format = format)
+ expect_is(ds$format, "CsvFileFormat")
+ expect_is(ds$filesystem, "LocalFileSystem")
Review comment:
We should have at least one of the tests testing that you can just pass
the arguments in `...`
```suggestion
ds <- open_dataset(dst_dir, format = "csv", column_names = c("foo"))
```
##########
File path: r/tests/testthat/test-dataset.R
##########
@@ -295,6 +295,45 @@ test_that("CSV dataset", {
)
})
+test_that("CSV scan options", {
+ options <- FragmentScanOptions$create("text")
+ expect_equal(options$type, "csv")
+ options <- FragmentScanOptions$create("csv",
+ null_values = c("mynull"),
+ strings_can_be_null = TRUE)
+ expect_equal(options$type, "csv")
+
+ dst_dir <- make_temp_dir()
+ dst_file <- file.path(dst_dir, "data.csv")
+ df <- tibble(chr = c("foo", "mynull"))
+ write.csv(df, dst_file, row.names = FALSE, quote = FALSE)
+
+ ds <- open_dataset(dst_dir, format = "csv")
+ expect_equivalent(ds %>% collect(), df)
+
+ sb <- ds$NewScan()
+ sb$FragmentScanOptions(options)
+
+ tab <- sb$Finish()$ToTable()
Review comment:
I think we'll want to accept `scan_options` in `collect()` since that's
the way that users typically do a scan, but that can be done in a followup.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]