joelnitta opened a new issue, #39811: URL: https://github.com/apache/arrow/issues/39811
### Describe the bug, including details regarding any error messages, version, and platform. (originally posted as a comment on #38903, but suggested by @thisisnic to file as its own issue) [The current documentation of open_delim_dataset()](https://arrow.apache.org/docs/r/reference/open_delim_dataset.html) says that a "compact string representation" of column types can be used for the `col_types` argument. This nearly identical to wording for the [`col_types` argument of {readr}](https://readr.tidyverse.org/reference/read_delim.html), but no additional explanation is provided. So I assumed that's what it meant, but that this does not seem to work: ``` r library(readr) library(arrow) #> #> Attaching package: 'arrow' #> The following object is masked from 'package:utils': #> #> timestamp # works read_csv(readr_example("mtcars.csv"), col_types = paste(rep("c", 11), collapse = "")) #> # A tibble: 32 × 11 #> mpg cyl disp hp drat wt qsec vs am gear carb #> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> #> 1 21 6 160 110 3.9 2.62 16.46 0 1 4 4 #> 2 21 6 160 110 3.9 2.875 17.02 0 1 4 4 #> 3 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1 #> 4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 #> 5 18.7 8 360 175 3.15 3.44 17.02 0 0 3 2 #> 6 18.1 6 225 105 2.76 3.46 20.22 1 0 3 1 #> 7 14.3 8 360 245 3.21 3.57 15.84 0 0 3 4 #> 8 24.4 4 146.7 62 3.69 3.19 20 1 0 4 2 #> 9 22.8 4 140.8 95 3.92 3.15 22.9 1 0 4 2 #> 10 19.2 6 167.6 123 3.92 3.44 18.3 1 0 4 4 #> # ℹ 22 more rows # works open_csv_dataset(readr_example("mtcars.csv")) #> FileSystemDataset with 1 csv file #> mpg: double #> cyl: int64 #> disp: double #> hp: int64 #> drat: double #> wt: double #> qsec: double #> vs: int64 #> am: int64 #> gear: int64 #> carb: int64 # doesn't work open_csv_dataset(readr_example("mtcars.csv"), col_types = paste(rep("c", 11), collapse = "")) #> Error: #> ! Unsupported `col_types` specification. #> ℹ `col_types` must be NULL, or a <Schema>. #> Backtrace: #> ▆ #> 1. └─arrow (local) `<fn>`(...) #> 2. └─arrow::open_dataset(...) #> 3. └─DatasetFactory$create(...) #> 4. └─FileFormat$create(...) #> 5. └─CsvFileFormat$create(...) #> 6. └─arrow:::check_csv_file_format_args(dots, partitioning = partitioning) #> 7. ├─base::do.call(csv_file_format_convert_opts, args) #> 8. └─arrow (local) `<fn>`(...) #> 9. ├─base::do.call(csv_convert_options, opts) #> 10. └─arrow (local) `<fn>`(...) #> 11. └─rlang::abort(c("Unsupported `col_types` specification.", i = "`col_types` must be NULL, or a <Schema>.")) ``` <sup>Created on 2024-01-24 with [reprex v2.0.2](https://reprex.tidyverse.org)</sup> <details style="margin-bottom:10px;"> <summary> Session info </summary> ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.3.2 (2023-10-31) #> os macOS Sonoma 14.1.2 #> system aarch64, darwin20 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype UTF-8 #> tz Asia/Tokyo #> date 2024-01-24 #> pandoc 3.1.2 @ /usr/local/bin/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> arrow * 14.0.0.2 2023-12-02 [1] CRAN (R 4.3.1) #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.3.0) #> bit 4.0.5 2022-11-15 [1] CRAN (R 4.3.0) #> bit64 4.0.5 2020-08-30 [1] CRAN (R 4.3.0) #> cli 3.6.2 2023-12-11 [1] CRAN (R 4.3.1) #> crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.0) #> digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.0) #> evaluate 0.23 2023-11-01 [1] CRAN (R 4.3.1) #> fansi 1.0.6 2023-12-08 [1] CRAN (R 4.3.1) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0) #> fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.0) #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.0) #> hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.0) #> htmltools 0.5.7 2023-11-03 [1] CRAN (R 4.3.1) #> knitr 1.45 2023-10-30 [1] CRAN (R 4.3.1) #> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.1) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0) #> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.0) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0) #> purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.0) #> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.3.0) #> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.3.0) #> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.3.0) #> R.utils 2.12.3 2023-11-18 [1] CRAN (R 4.3.1) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0) #> readr * 2.1.4 2023-02-10 [1] CRAN (R 4.3.0) #> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.3.0) #> rlang 1.1.2 2023-11-04 [1] CRAN (R 4.3.1) #> rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0) #> styler 1.10.2 2023-08-29 [1] CRAN (R 4.3.0) #> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.0) #> tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.0) #> tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.0) #> utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.1) #> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.1) #> vroom 1.6.5 2023-12-05 [1] CRAN (R 4.3.1) #> withr 2.5.2 2023-10-30 [1] CRAN (R 4.3.1) #> xfun 0.41 2023-11-01 [1] CRAN (R 4.3.1) #> yaml 2.3.8 2023-12-11 [1] CRAN (R 4.3.1) #> #> [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library #> #> ────────────────────────────────────────────────────────────────────────────── ``` </details> ### Component(s) R -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
