[ 
https://issues.apache.org/jira/browse/ARROW-6537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-6537.
------------------------------------
    Resolution: Fixed

Issue resolved by pull request 7807
[https://github.com/apache/arrow/pull/7807]

> [R] Pass column_types to CSV reader
> -----------------------------------
>
>                 Key: ARROW-6537
>                 URL: https://issues.apache.org/jira/browse/ARROW-6537
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, R
>            Reporter: Neal Richardson
>            Assignee: Romain Francois
>            Priority: Major
>              Labels: csv, dataset, pull-request-available
>             Fix For: 2.0.0
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> See also ARROW-6536. It may be the case that the csv reader does accept a 
> Schema now, I think I saw that, but otherwise it takes unordered_map. 
> {{read_csv_arrow}} should take for {{col_types}} either a Schema, a named 
> list of Types, or the "compact string representation" that {{readr}} 
> supports. Per its docs, "c = character, i = integer, n = number, d = double, 
> l = logical, f = factor, D = date, T = date time, t = time, ? = guess, or _/- 
> to skip the column." So, c = utf8(), i = int32(), d = float64(), l = bool(), 
> f = dictionary(int32(), utf8()), D = date32(), T = timestamp(), t = time32(), 
> etc. I'm not sure if ? and - are supported, and/or what exactly happens if 
> you don't specify types for all columns, but I guess we'll find out, and we 
> can make JIRAs if important features are missing. 
> Following the existing conventions in csv.R, that compact string 
> representation would be encapsulated in {{read_csv_arrow}}, so CsvTableReader 
> and the various Csv*Options would only deal with the Arrow C++ interface. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to