[
https://issues.apache.org/jira/browse/ARROW-6537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Neal Richardson resolved ARROW-6537.
------------------------------------
Resolution: Fixed
Issue resolved by pull request 7807
[https://github.com/apache/arrow/pull/7807]
> [R] Pass column_types to CSV reader
> -----------------------------------
>
> Key: ARROW-6537
> URL: https://issues.apache.org/jira/browse/ARROW-6537
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++, R
> Reporter: Neal Richardson
> Assignee: Romain Francois
> Priority: Major
> Labels: csv, dataset, pull-request-available
> Fix For: 2.0.0
>
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
> See also ARROW-6536. It may be the case that the csv reader does accept a
> Schema now, I think I saw that, but otherwise it takes unordered_map.
> {{read_csv_arrow}} should take for {{col_types}} either a Schema, a named
> list of Types, or the "compact string representation" that {{readr}}
> supports. Per its docs, "c = character, i = integer, n = number, d = double,
> l = logical, f = factor, D = date, T = date time, t = time, ? = guess, or _/-
> to skip the column." So, c = utf8(), i = int32(), d = float64(), l = bool(),
> f = dictionary(int32(), utf8()), D = date32(), T = timestamp(), t = time32(),
> etc. I'm not sure if ? and - are supported, and/or what exactly happens if
> you don't specify types for all columns, but I guess we'll find out, and we
> can make JIRAs if important features are missing.
> Following the existing conventions in csv.R, that compact string
> representation would be encapsulated in {{read_csv_arrow}}, so CsvTableReader
> and the various Csv*Options would only deal with the Arrow C++ interface.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)