joelnitta opened a new issue, #39811:
URL: https://github.com/apache/arrow/issues/39811

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   (originally posted as a comment on #38903, but suggested by @thisisnic to 
file as its own issue)
   
   [The current documentation of 
open_delim_dataset()](https://arrow.apache.org/docs/r/reference/open_delim_dataset.html)
 says that a "compact string representation" of column types can be used for 
the `col_types` argument. This nearly identical to wording for the [`col_types` 
argument of {readr}](https://readr.tidyverse.org/reference/read_delim.html), 
but no additional explanation is provided. So I assumed that's what it meant, 
but that this does not seem to work:
   
   ``` r
   library(readr)
   library(arrow)
   #> 
   #> Attaching package: 'arrow'
   #> The following object is masked from 'package:utils':
   #> 
   #>     timestamp
   
   # works
   read_csv(readr_example("mtcars.csv"), col_types = paste(rep("c", 11), 
collapse = ""))
   #> # A tibble: 32 × 11
   #>    mpg   cyl   disp  hp    drat  wt    qsec  vs    am    gear  carb 
   #>    <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
   #>  1 21    6     160   110   3.9   2.62  16.46 0     1     4     4    
   #>  2 21    6     160   110   3.9   2.875 17.02 0     1     4     4    
   #>  3 22.8  4     108   93    3.85  2.32  18.61 1     1     4     1    
   #>  4 21.4  6     258   110   3.08  3.215 19.44 1     0     3     1    
   #>  5 18.7  8     360   175   3.15  3.44  17.02 0     0     3     2    
   #>  6 18.1  6     225   105   2.76  3.46  20.22 1     0     3     1    
   #>  7 14.3  8     360   245   3.21  3.57  15.84 0     0     3     4    
   #>  8 24.4  4     146.7 62    3.69  3.19  20    1     0     4     2    
   #>  9 22.8  4     140.8 95    3.92  3.15  22.9  1     0     4     2    
   #> 10 19.2  6     167.6 123   3.92  3.44  18.3  1     0     4     4    
   #> # ℹ 22 more rows
   
   # works
   open_csv_dataset(readr_example("mtcars.csv"))
   #> FileSystemDataset with 1 csv file
   #> mpg: double
   #> cyl: int64
   #> disp: double
   #> hp: int64
   #> drat: double
   #> wt: double
   #> qsec: double
   #> vs: int64
   #> am: int64
   #> gear: int64
   #> carb: int64
   
   # doesn't work
   open_csv_dataset(readr_example("mtcars.csv"), col_types = paste(rep("c", 
11), collapse = ""))
   #> Error:
   #> ! Unsupported `col_types` specification.
   #> ℹ `col_types` must be NULL, or a <Schema>.
   #> Backtrace:
   #>      ▆
   #>   1. └─arrow (local) `<fn>`(...)
   #>   2.   └─arrow::open_dataset(...)
   #>   3.     └─DatasetFactory$create(...)
   #>   4.       └─FileFormat$create(...)
   #>   5.         └─CsvFileFormat$create(...)
   #>   6.           └─arrow:::check_csv_file_format_args(dots, partitioning = 
partitioning)
   #>   7.             ├─base::do.call(csv_file_format_convert_opts, args)
   #>   8.             └─arrow (local) `<fn>`(...)
   #>   9.               ├─base::do.call(csv_convert_options, opts)
   #>  10.               └─arrow (local) `<fn>`(...)
   #>  11.                 └─rlang::abort(c("Unsupported `col_types` 
specification.", i = "`col_types` must be NULL, or a <Schema>."))
   ```
   
   <sup>Created on 2024-01-24 with [reprex 
v2.0.2](https://reprex.tidyverse.org)</sup>
   
   <details style="margin-bottom:10px;">
   <summary>
   Session info
   </summary>
   
   ``` r
   sessioninfo::session_info()
   #> ─ Session info 
───────────────────────────────────────────────────────────────
   #>  setting  value
   #>  version  R version 4.3.2 (2023-10-31)
   #>  os       macOS Sonoma 14.1.2
   #>  system   aarch64, darwin20
   #>  ui       X11
   #>  language (EN)
   #>  collate  en_US.UTF-8
   #>  ctype    UTF-8
   #>  tz       Asia/Tokyo
   #>  date     2024-01-24
   #>  pandoc   3.1.2 @ /usr/local/bin/ (via rmarkdown)
   #> 
   #> ─ Packages 
───────────────────────────────────────────────────────────────────
   #>  package     * version  date (UTC) lib source
   #>  arrow       * 14.0.0.2 2023-12-02 [1] CRAN (R 4.3.1)
   #>  assertthat    0.2.1    2019-03-21 [1] CRAN (R 4.3.0)
   #>  bit           4.0.5    2022-11-15 [1] CRAN (R 4.3.0)
   #>  bit64         4.0.5    2020-08-30 [1] CRAN (R 4.3.0)
   #>  cli           3.6.2    2023-12-11 [1] CRAN (R 4.3.1)
   #>  crayon        1.5.2    2022-09-29 [1] CRAN (R 4.3.0)
   #>  digest        0.6.33   2023-07-07 [1] CRAN (R 4.3.0)
   #>  evaluate      0.23     2023-11-01 [1] CRAN (R 4.3.1)
   #>  fansi         1.0.6    2023-12-08 [1] CRAN (R 4.3.1)
   #>  fastmap       1.1.1    2023-02-24 [1] CRAN (R 4.3.0)
   #>  fs            1.6.3    2023-07-20 [1] CRAN (R 4.3.0)
   #>  glue          1.6.2    2022-02-24 [1] CRAN (R 4.3.0)
   #>  hms           1.1.3    2023-03-21 [1] CRAN (R 4.3.0)
   #>  htmltools     0.5.7    2023-11-03 [1] CRAN (R 4.3.1)
   #>  knitr         1.45     2023-10-30 [1] CRAN (R 4.3.1)
   #>  lifecycle     1.0.4    2023-11-07 [1] CRAN (R 4.3.1)
   #>  magrittr      2.0.3    2022-03-30 [1] CRAN (R 4.3.0)
   #>  pillar        1.9.0    2023-03-22 [1] CRAN (R 4.3.0)
   #>  pkgconfig     2.0.3    2019-09-22 [1] CRAN (R 4.3.0)
   #>  purrr         1.0.2    2023-08-10 [1] CRAN (R 4.3.0)
   #>  R.cache       0.16.0   2022-07-21 [1] CRAN (R 4.3.0)
   #>  R.methodsS3   1.8.2    2022-06-13 [1] CRAN (R 4.3.0)
   #>  R.oo          1.25.0   2022-06-12 [1] CRAN (R 4.3.0)
   #>  R.utils       2.12.3   2023-11-18 [1] CRAN (R 4.3.1)
   #>  R6            2.5.1    2021-08-19 [1] CRAN (R 4.3.0)
   #>  readr       * 2.1.4    2023-02-10 [1] CRAN (R 4.3.0)
   #>  reprex        2.0.2    2022-08-17 [1] CRAN (R 4.3.0)
   #>  rlang         1.1.2    2023-11-04 [1] CRAN (R 4.3.1)
   #>  rmarkdown     2.25     2023-09-18 [1] CRAN (R 4.3.1)
   #>  sessioninfo   1.2.2    2021-12-06 [1] CRAN (R 4.3.0)
   #>  styler        1.10.2   2023-08-29 [1] CRAN (R 4.3.0)
   #>  tibble        3.2.1    2023-03-20 [1] CRAN (R 4.3.0)
   #>  tidyselect    1.2.0    2022-10-10 [1] CRAN (R 4.3.0)
   #>  tzdb          0.4.0    2023-05-12 [1] CRAN (R 4.3.0)
   #>  utf8          1.2.4    2023-10-22 [1] CRAN (R 4.3.1)
   #>  vctrs         0.6.5    2023-12-01 [1] CRAN (R 4.3.1)
   #>  vroom         1.6.5    2023-12-05 [1] CRAN (R 4.3.1)
   #>  withr         2.5.2    2023-10-30 [1] CRAN (R 4.3.1)
   #>  xfun          0.41     2023-11-01 [1] CRAN (R 4.3.1)
   #>  yaml          2.3.8    2023-12-11 [1] CRAN (R 4.3.1)
   #> 
   #>  [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
   #> 
   #> 
──────────────────────────────────────────────────────────────────────────────
   ```
   
   </details>
   
   ### Component(s)
   
   R


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to