Dragoș Moldovan-Grünfeld created ARROW-15912:
------------------------------------------------

             Summary: [C++] Is CSV reader's TimestampParser usable elsewhere?
                 Key: ARROW-15912
                 URL: https://issues.apache.org/jira/browse/ARROW-15912
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: Dragoș Moldovan-Grünfeld


The {{TimestampParser}} is be able to cycle through several formats. This sort 
of functionality would be very useful for some of the lubridate bindings that 
need to behave in a similar way. 

{code:r}
library(arrow)
library(fs)
library(readr)
library(tibble)

tf <- fs::file_temp(ext = "csv")
fs::file_create(tf)

sample_times <- tibble(a = c("09/13/2013", "25/12/1998", "09-13-13", 
"23_Feb_2022", "09/13/2018"))
write_csv(sample_times, tf)


read_csv_arrow(tf, 
               as_data_frame = TRUE,
               timestamp_parsers = c("%m/%d/%Y", "%d/%m/%Y", "%m-%d-%y", 
"%d_%b_%Y"))
#> # A tibble: 5 × 1
#>   a                  
#>   <dttm>             
#> 1 2013-09-13 01:00:00
#> 2 1998-12-25 00:00:00
#> 3 2013-09-13 01:00:00
#> 4 2022-02-23 00:00:00
#> 5 2018-09-13 01:00:00
{code}

For example, in lubridate, the {{ymd()}} cycles through all possible formats 
that have year-month-date components in the right order (e.g. {{"%Y-%m-%d", 
"%y-%m-%d", "%Y-%b-%d", "%y-%b-%d", "%Y-%B-%d", "%y-%b-%d"}}, etc).   

I guess my question is can we factor this CSV reader feature to be usable 
elsewhere? 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to