[GitHub] [arrow] toppyy commented on a change in pull request #12083: ARROW-14744: [R] open_dataset() error when `schema` argument supplied, but `column_names` not supplied to `CSVReadOptions`

GitBox Mon, 17 Jan 2022 11:26:38 -0800


toppyy commented on a change in pull request #12083:
URL: https://github.com/apache/arrow/pull/12083#discussion_r786252402




##########
File path: r/R/dataset-format.R
##########
@@ -122,6 +122,18 @@ CsvFileFormat$create <- function(...,
                                  opts = csv_file_format_parse_options(...),
                                  convert_options = 
csv_file_format_convert_opts(...),
                                  read_options = 
csv_file_format_read_opts(...)) {
+
+  options <- list(...)
+  schema  <- options[["schema"]]
+
+  if (length(read_options$column_names) > 0 & !is.null(schema) & 
!identical(names(schema), read_options$column_names)) {
+    abort(c(
+        '"column_names" in read_options do not match the schema.',
+      i = "Set column_names in read_options to match the schema",
+      i = "Omit the read_options argument"
+    ))

Review comment:
       Thanks for the good comments!
   
   It raises an error and prints out the mismatches between 
read_options$column_names and names(schema).
   
   At the moment a mismatch is either due to 1) set difference or a 2) 
different order of column names. However, I'm a bit unsure of the latter. Do we 
in fact want to raise an error if the order of names differ? For example: 
`c("a","b") != c("a","b")`
   
   Also, if this is the case, I'd be interested if you have any tips on 
convenient ways to do this. I resorted to using `suppressWarnings` to handle 
cases where vector lengths differ.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] toppyy commented on a change in pull request #12083: ARROW-14744: [R] open_dataset() error when `schema` argument supplied, but `column_names` not supplied to `CSVReadOptions`

Reply via email to