atsyplenkov opened a new pull request, #45719: URL: https://github.com/apache/arrow/pull/45719
### Rationale for this change Hi, can you please consider this tiny update to the docs? In the current documentation, it's misleading how to specify col_types when a delimited file is scanned using `open_csv_dataset`, `open_delim_dataset`, etc. Reading what is currently written, one may assume that they can declare column types by providing the compact string representation that `readr` uses. https://github.com/apache/arrow/blob/3c8fe098c7f5e0e40bd06bc6afca8412eb81f56e/r/man/open_delim_dataset.Rd#L164-L165 But it doesn't work. See reprex below ```r library(arrow) #> #> Attaching package: 'arrow' #> The following object is masked from 'package:utils': #> #> timestamp tf <- tempfile() dir.create(tf) df <- data.frame(x = c("1", "2", "NULL")) file_path <- file.path(tf, "file1.txt") write.table(df, file_path, sep = ",", row.names = FALSE) open_csv_dataset(file_path, na = c("", "NA", "NULL"), col_types = "c") #> Error: #> ! Unsupported `col_types` specification. #> ℹ `col_types` must be NULL, or a <Schema>. unlink(tf) ``` ### What changes are included in this PR? The current PR provides a clearer explanation of what should be passed to the `col_types` argument, along with a basic example for the `open_csv_dataset()`. ### Are these changes tested? Not needed, as only the R documentation has been updated ### Are there any user-facing changes? Only the R documentation has been updated -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
