[ 
https://issues.apache.org/jira/browse/ARROW-15912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dragoș Moldovan-Grünfeld updated ARROW-15912:
---------------------------------------------
    Issue Type: Wish  (was: Improvement)

> [C++] Is CSV reader's TimestampParser usable elsewhere?
> -------------------------------------------------------
>
>                 Key: ARROW-15912
>                 URL: https://issues.apache.org/jira/browse/ARROW-15912
>             Project: Apache Arrow
>          Issue Type: Wish
>          Components: C++
>            Reporter: Dragoș Moldovan-Grünfeld
>            Priority: Major
>
> The {{TimestampParser}} seems to be able to cycle through several formats. 
> This sort of functionality would be very useful for some of the lubridate 
> bindings that need to behave in a similar way. 
> {code:r}
> library(arrow)
> library(fs)
> library(readr)
> library(tibble)
> tf <- fs::file_temp(ext = "csv")
> fs::file_create(tf)
> sample_times <- tibble(a = c("09/13/2013", "25/12/1998", "09-13-13", 
> "23_Feb_2022", "09/13/2018"))
> write_csv(sample_times, tf)
> read_csv_arrow(tf, 
>                as_data_frame = TRUE,
>                timestamp_parsers = c("%m/%d/%Y", "%d/%m/%Y", "%m-%d-%y", 
> "%d_%b_%Y"))
> #> # A tibble: 5 × 1
> #>   a                  
> #>   <dttm>             
> #> 1 2013-09-13 01:00:00
> #> 2 1998-12-25 00:00:00
> #> 3 2013-09-13 01:00:00
> #> 4 2022-02-23 00:00:00
> #> 5 2018-09-13 01:00:00
> {code}
> For example, in lubridate, the {{ymd()}} cycles through all possible formats 
> that have year-month-date components in the right order (e.g. {{"%Y-%m-%d", 
> "%y-%m-%d", "%Y-%b-%d", "%y-%b-%d", "%Y-%B-%d", "%y-%b-%d"}}, etc).   
> I guess my question is: Can we factor this CSV reader feature to be usable 
> elsewhere? 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to