ianmcook commented on a change in pull request #9143:
URL: https://github.com/apache/arrow/pull/9143#discussion_r554277501



##########
File path: r/R/dataset-format.R
##########
@@ -104,9 +104,31 @@ CsvFileFormat$create <- function(..., opts = 
csv_file_format_parse_options(...))
 }
 
 csv_file_format_parse_options <- function(...) {
+  opt_names <- names(list(...))
   # Support both the readr spelling of options and the arrow spelling
-  readr_opts <- c("delim", "quote", "escape_double", "escape_backslash", 
"skip_empty_rows")
-  if (any(readr_opts %in% names(list(...)))) {
+  arrow_opts <- names(formals(CsvParseOptions$create))
+  readr_opts <- names(formals(readr_to_csv_parse_options))
+  is_arrow_opt <- !is.na(pmatch(opt_names, arrow_opts))
+  is_readr_opt <- !is.na(pmatch(opt_names, readr_opts))
+  bad_opts <- opt_names[!is_arrow_opt & !is_readr_opt]
+  if (length(bad_opts)) {
+    stop("Unsupported options: ",
+         paste(bad_opts, collapse = ", "),
+         call. = FALSE)
+  }
+  is_ambig_opt <- is.na(pmatch(opt_names, c(arrow_opts, readr_opts)))

Review comment:
       `arrow_opts` is a vector of the names of the allowed arguments to 
`CsvParseOptions$create()` (Arrow-style arguments). `readr_opts` is a vector of 
the names of the allowed arguments to `readr_to_csv_parse_options()` 
(readr-style arguments).
   ```r
   arrow_opts
   ## [1] "delimiter"          "quoting"            "quote_char"         
"double_quote"       "escaping"          
   ## [6] "escape_char"        "newlines_in_values" "ignore_empty_lines"
   
   readr_opts
   ## [1] "delim"            "quote"            "escape_double"    
"escape_backslash" "skip_empty_rows"
   ```
   The function we're inside here (`csv_file_format_parse_options()`) allows 
_either_ of these sets of arguments. These two sets of argument names are 
mutually exclusive, but R's partial matching of argument names throws a wrench 
in that. For example, if someone shortens either `delimiter` or `delim` to just 
`del`, that would work fine in a function that accepts _only_ Arrow-style 
arguments or _only_ readr-style options, but here it creates ambiguity—we can't 
tell if the user is intending to specify Arrow-style arguments or readr-style 
arguments.
   
   ```r
   open_dataset("/path/to/csv/", format = "csv", del = ";")
   ## đź’€
   ```
   So `pmatch()` to the rescue. `pmatch()`, when called like it is here, uses 
the same algorithm for partial matching that R uses to identify named arguments 
in function calls. `pmatch(x, y)` returns a vector of the same length as `x`, 
and in each position, the value will be `NA` if and only if the character 
string in that position in `x` _cannot_ be unambiguously matched to exactly one 
character string in `y`. So if there are any `NA` values in the vector returned 
by `pmatch()`, that means at least one of the argument names is ambiguous.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to