[ 
https://issues.apache.org/jira/browse/ARROW-14471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488048#comment-17488048
 ] 

Dragoș Moldovan-Grünfeld commented on ARROW-14471:
--------------------------------------------------

[~paleolimbot] I don't think we can rely on {{coalesce()}} to iterate through 
the various formats supported for {{ymd()}}. It would need to rely on the 
assumption that the passed {{format}} matches the data or otherwise fail. 
Sadly, arrow works with a wrong format resulting in weird timestamps:

{code:r}
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(arrow))
suppressPackageStartupMessages(library(lubridate))

df <- tibble(x = c("09-01-01", "09-01-02", "09-01-03"))
df
#> # A tibble: 3 × 1
#>   x       
#>   <chr>   
#> 1 09-01-01
#> 2 09-01-02
#> 3 09-01-03

# lubridate::ymd()
df %>% 
  mutate(y = ymd(x))
#> # A tibble: 3 × 2
#>   x        y         
#>   <chr>    <date>    
#> 1 09-01-01 2009-01-01
#> 2 09-01-02 2009-01-02
#> 3 09-01-03 2009-01-03

# y = short year correct
df %>% 
  record_batch() %>% 
  mutate(y = strptime(x, format = "%y-%m-%d", unit = "us")) %>% 
  collect()
#> # A tibble: 3 × 2
#>   x        y                  
#>   <chr>    <dttm>             
#> 1 09-01-01 2009-01-01 00:00:00
#> 2 09-01-02 2009-01-02 00:00:00
#> 3 09-01-03 2009-01-03 00:00:00

# Y = long year this should fail in order for us to rely on coalesce
df %>% 
  record_batch() %>% 
  mutate(y = strptime(x, format = "%Y-%m-%d", unit = "us")) %>% 
  collect()
#> # A tibble: 3 × 2
#>   x        y                  
#>   <chr>    <dttm>             
#> 1 09-01-01 0008-12-31 23:58:45
#> 2 09-01-02 0009-01-01 23:58:45
#> 3 09-01-03 0009-01-02 23:58:45
{code}

Therefore, my conclusion would be that we cannot implement {{arrow::ymd()}} 
binding as {{coalesce(strptime(x, format1), strptime(x, format2), ...)}}. What 
do you think?

> [R] Implement lubridate's date/time parsing functions
> -----------------------------------------------------
>
>                 Key: ARROW-14471
>                 URL: https://issues.apache.org/jira/browse/ARROW-14471
>             Project: Apache Arrow
>          Issue Type: Sub-task
>          Components: R
>            Reporter: Nicola Crane
>            Assignee: Dragoș Moldovan-Grünfeld
>            Priority: Major
>             Fix For: 8.0.0
>
>
> Parse dates with year, month, and day components:
> ymd() ydm() mdy() myd() dmy() dym() yq() ym() my()
>       
> Parse date-times with year, month, and day, hour, minute, and second 
> components:
> ymd_hms() ymd_hm() ymd_h() dmy_hms() dmy_hm() dmy_h() mdy_hms() mdy_hm() 
> mdy_h() ydm_hms() ydm_hm() ydm_h()
> Parse periods with hour, minute, and second components:
> ms() hm() hms()
>       



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to