atsyplenkov opened a new pull request, #45719:
URL: https://github.com/apache/arrow/pull/45719

   ### Rationale for this change
   Hi, can you please consider this tiny update to the docs? In the current 
documentation, it's misleading how to specify col_types when a delimited file 
is scanned using `open_csv_dataset`, `open_delim_dataset`, etc. Reading what is 
currently written, one may assume that they can declare column types by 
providing the compact string representation that `readr` uses. 
   
   
https://github.com/apache/arrow/blob/3c8fe098c7f5e0e40bd06bc6afca8412eb81f56e/r/man/open_delim_dataset.Rd#L164-L165
   
   But it doesn't work. See reprex below
   
   ```r
   library(arrow)
   #> 
   #> Attaching package: 'arrow'
   #> The following object is masked from 'package:utils':
   #> 
   #>     timestamp
   tf <- tempfile()
   dir.create(tf)
   df <- data.frame(x = c("1", "2", "NULL"))
   
   file_path <- file.path(tf, "file1.txt")
   write.table(df, file_path, sep = ",", row.names = FALSE)
   
   open_csv_dataset(file_path, na = c("", "NA", "NULL"), col_types = "c")
   #> Error:
   #> ! Unsupported `col_types` specification.
   #> ℹ `col_types` must be NULL, or a <Schema>.
   
   unlink(tf)
   ```
   
   ### What changes are included in this PR?
   The current PR provides a clearer explanation of what should be passed to 
the `col_types` argument, along with a basic example for the 
`open_csv_dataset()`.
   
   ### Are these changes tested?
   Not needed, as only the R documentation has been updated
   
   ### Are there any user-facing changes?
   Only the R documentation has been updated
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to