joelnitta commented on issue #38903: URL: https://github.com/apache/arrow/issues/38903#issuecomment-1907432572
I would add that [the current documentation](https://arrow.apache.org/docs/r/reference/open_delim_dataset.html) says that a "compact string representation" of column types is allowable. This is very similar to the wording of [{readr}](https://readr.tidyverse.org/reference/read_delim.html), so without additional explanation I assumed that's what it meant, but that this does not seem to work: ``` r library(readr) library(arrow) #> #> Attaching package: 'arrow' #> The following object is masked from 'package:utils': #> #> timestamp # works read_csv(readr_example("mtcars.csv"), col_types = paste(rep("c", 11), collapse = "")) #> # A tibble: 32 × 11 #> mpg cyl disp hp drat wt qsec vs am gear carb #> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> #> 1 21 6 160 110 3.9 2.62 16.46 0 1 4 4 #> 2 21 6 160 110 3.9 2.875 17.02 0 1 4 4 #> 3 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1 #> 4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 #> 5 18.7 8 360 175 3.15 3.44 17.02 0 0 3 2 #> 6 18.1 6 225 105 2.76 3.46 20.22 1 0 3 1 #> 7 14.3 8 360 245 3.21 3.57 15.84 0 0 3 4 #> 8 24.4 4 146.7 62 3.69 3.19 20 1 0 4 2 #> 9 22.8 4 140.8 95 3.92 3.15 22.9 1 0 4 2 #> 10 19.2 6 167.6 123 3.92 3.44 18.3 1 0 4 4 #> # ℹ 22 more rows # works open_csv_dataset(readr_example("mtcars.csv")) #> FileSystemDataset with 1 csv file #> mpg: double #> cyl: int64 #> disp: double #> hp: int64 #> drat: double #> wt: double #> qsec: double #> vs: int64 #> am: int64 #> gear: int64 #> carb: int64 # doesn't work open_csv_dataset(readr_example("mtcars.csv"), col_types = paste(rep("c", 11), collapse = "")) #> Error: #> ! Unsupported `col_types` specification. #> ℹ `col_types` must be NULL, or a <Schema>. #> Backtrace: #> ▆ #> 1. └─arrow (local) `<fn>`(...) #> 2. └─arrow::open_dataset(...) #> 3. └─DatasetFactory$create(...) #> 4. └─FileFormat$create(...) #> 5. └─CsvFileFormat$create(...) #> 6. └─arrow:::check_csv_file_format_args(dots, partitioning = partitioning) #> 7. ├─base::do.call(csv_file_format_convert_opts, args) #> 8. └─arrow (local) `<fn>`(...) #> 9. ├─base::do.call(csv_convert_options, opts) #> 10. └─arrow (local) `<fn>`(...) #> 11. └─rlang::abort(c("Unsupported `col_types` specification.", i = "`col_types` must be NULL, or a <Schema>.")) ``` <sup>Created on 2024-01-24 with [reprex v2.0.2](https://reprex.tidyverse.org)</sup> <details style="margin-bottom:10px;"> <summary> Session info </summary> ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.3.2 (2023-10-31) #> os macOS Sonoma 14.1.2 #> system aarch64, darwin20 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype UTF-8 #> tz Asia/Tokyo #> date 2024-01-24 #> pandoc 3.1.2 @ /usr/local/bin/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> arrow * 14.0.0.2 2023-12-02 [1] CRAN (R 4.3.1) #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.3.0) #> bit 4.0.5 2022-11-15 [1] CRAN (R 4.3.0) #> bit64 4.0.5 2020-08-30 [1] CRAN (R 4.3.0) #> cli 3.6.2 2023-12-11 [1] CRAN (R 4.3.1) #> crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.0) #> digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.0) #> evaluate 0.23 2023-11-01 [1] CRAN (R 4.3.1) #> fansi 1.0.6 2023-12-08 [1] CRAN (R 4.3.1) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0) #> fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.0) #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.0) #> hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.0) #> htmltools 0.5.7 2023-11-03 [1] CRAN (R 4.3.1) #> knitr 1.45 2023-10-30 [1] CRAN (R 4.3.1) #> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.1) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0) #> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.0) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0) #> purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.0) #> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.3.0) #> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.3.0) #> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.3.0) #> R.utils 2.12.3 2023-11-18 [1] CRAN (R 4.3.1) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0) #> readr * 2.1.4 2023-02-10 [1] CRAN (R 4.3.0) #> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.3.0) #> rlang 1.1.2 2023-11-04 [1] CRAN (R 4.3.1) #> rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0) #> styler 1.10.2 2023-08-29 [1] CRAN (R 4.3.0) #> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.0) #> tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.0) #> tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.0) #> utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.1) #> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.1) #> vroom 1.6.5 2023-12-05 [1] CRAN (R 4.3.1) #> withr 2.5.2 2023-10-30 [1] CRAN (R 4.3.1) #> xfun 0.41 2023-11-01 [1] CRAN (R 4.3.1) #> yaml 2.3.8 2023-12-11 [1] CRAN (R 4.3.1) #> #> [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library #> #> ────────────────────────────────────────────────────────────────────────────── ``` </details> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
