paleolimbot commented on code in PR #38002:
URL: https://github.com/apache/arrow/pull/38002#discussion_r1349137559
##########
r/R/csv.R:
##########
@@ -279,6 +280,31 @@ read_csv_arrow <- function(file,
eval.parent(mc)
}
+#' @rdname read_delim_arrow
+#' @export
+read_csv2_arrow <- function(file,
+ quote = '"',
+ escape_double = TRUE,
+ escape_backslash = FALSE,
+ schema = NULL,
+ col_names = TRUE,
+ col_types = NULL,
+ col_select = NULL,
+ na = c("", "NA"),
+ quoted_na = TRUE,
+ skip_empty_rows = TRUE,
+ skip = 0L,
+ parse_options = NULL,
+ convert_options =
CsvConvertOptions$create(decimal_point = ","),
+ read_options = NULL,
+ as_data_frame = TRUE,
+ timestamp_parsers = NULL) {
Review Comment:
I imagine these are freshly copy/pasted, but maybe double-check that the
defaults are correct.
##########
r/tests/testthat/test-csv.R:
##########
@@ -733,3 +733,10 @@ test_that("Can read CSV files from a URL", {
expect_true(tibble::is_tibble(cu))
expect_identical(dim(cu), c(100L, 13L))
})
+
+test_that("read_csv2_arrow correctly parses comma decimals", {
+ tf <- tempfile()
Review Comment:
```suggestion
tf <- tempfile()
on.exit(unlink(tf))
```
##########
r/tests/testthat/test-dataset-csv.R:
##########
@@ -637,3 +637,18 @@ test_that("GH-34640 - CSV datasets are read in correctly
when both schema and pa
summarize(mean = mean(integer))
)
})
+
+test_that("open_dataset() with `decimal_point` argument", {
+ temp_dir <- make_temp_dir()
Review Comment:
```suggestion
temp_dir <- make_temp_dir()
on.exit(unlink(temp_dir, recursive = TRUE))
```
##########
r/R/csv.R:
##########
@@ -279,6 +280,31 @@ read_csv_arrow <- function(file,
eval.parent(mc)
}
+#' @rdname read_delim_arrow
+#' @export
+read_csv2_arrow <- function(file,
+ quote = '"',
+ escape_double = TRUE,
+ escape_backslash = FALSE,
+ schema = NULL,
+ col_names = TRUE,
+ col_types = NULL,
+ col_select = NULL,
+ na = c("", "NA"),
+ quoted_na = TRUE,
+ skip_empty_rows = TRUE,
+ skip = 0L,
+ parse_options = NULL,
+ convert_options =
CsvConvertOptions$create(decimal_point = ","),
Review Comment:
This pattern seems like it might not generalize...if a user was using
`read_csv_arrow(..., convert_options = ...)`, they can't simply change that to
`read_csv2_arrow(..., convert_options = ...)` and get the behaviour they were
expecting. I wonder if adding `decimal_point` as a top-level argument to
`read_delim_arrow()` would lead to a more predictable relationship between
`read_csv_arrow()` and `read_csv2_arrow()`?
##########
r/tests/testthat/test-dataset-csv.R:
##########
@@ -637,3 +637,18 @@ test_that("GH-34640 - CSV datasets are read in correctly
when both schema and pa
summarize(mean = mean(integer))
)
})
+
+test_that("open_dataset() with `decimal_point` argument", {
+ temp_dir <- make_temp_dir()
+ writeLines("x\ty\n1,2\tc", con = file.path(temp_dir, "file1.csv"))
+
+ expect_equal(
+ open_dataset(temp_dir, format = "tsv") %>% collect(),
+ tibble(x = "1,2", y = "c")
+ )
+
+ expect_equal(
+ open_dataset(temp_dir, format = "tsv", decimal_point = ",") %>% collect(),
+ tibble(x = 1.2, y = "c")
+ )
Review Comment:
What happens if you try to pass a `decimal_point = "more than one
character"` or `decimal_point = ""`? (Not sure if it needs a test, but worth
checking that it doesn't crash)
##########
r/R/csv.R:
##########
@@ -279,6 +280,31 @@ read_csv_arrow <- function(file,
eval.parent(mc)
}
+#' @rdname read_delim_arrow
+#' @export
+read_csv2_arrow <- function(file,
+ quote = '"',
+ escape_double = TRUE,
+ escape_backslash = FALSE,
+ schema = NULL,
+ col_names = TRUE,
+ col_types = NULL,
+ col_select = NULL,
+ na = c("", "NA"),
+ quoted_na = TRUE,
+ skip_empty_rows = TRUE,
+ skip = 0L,
+ parse_options = NULL,
+ convert_options =
CsvConvertOptions$create(decimal_point = ","),
Review Comment:
(Or simply documenting that you can add that to `convert_options` and leave
the tidyverse-interface-copying for another time)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]