[
https://issues.apache.org/jira/browse/ARROW-14471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488048#comment-17488048
]
Dragoș Moldovan-Grünfeld commented on ARROW-14471:
--------------------------------------------------
[~paleolimbot] I don't think we can rely on {{coalesce()}} to iterate through
the various formats supported for {{ymd()}}. It would need to rely on the
assumption that the passed {{format}} matches the data or otherwise fail.
Sadly, arrow works with a wrong format resulting in weird timestamps:
{code:r}
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(arrow))
suppressPackageStartupMessages(library(lubridate))
df <- tibble(x = c("09-01-01", "09-01-02", "09-01-03"))
df
#> # A tibble: 3 × 1
#> x
#> <chr>
#> 1 09-01-01
#> 2 09-01-02
#> 3 09-01-03
# lubridate::ymd()
df %>%
mutate(y = ymd(x))
#> # A tibble: 3 × 2
#> x y
#> <chr> <date>
#> 1 09-01-01 2009-01-01
#> 2 09-01-02 2009-01-02
#> 3 09-01-03 2009-01-03
# y = short year correct
df %>%
record_batch() %>%
mutate(y = strptime(x, format = "%y-%m-%d", unit = "us")) %>%
collect()
#> # A tibble: 3 × 2
#> x y
#> <chr> <dttm>
#> 1 09-01-01 2009-01-01 00:00:00
#> 2 09-01-02 2009-01-02 00:00:00
#> 3 09-01-03 2009-01-03 00:00:00
# Y = long year this should fail in order for us to rely on coalesce
df %>%
record_batch() %>%
mutate(y = strptime(x, format = "%Y-%m-%d", unit = "us")) %>%
collect()
#> # A tibble: 3 × 2
#> x y
#> <chr> <dttm>
#> 1 09-01-01 0008-12-31 23:58:45
#> 2 09-01-02 0009-01-01 23:58:45
#> 3 09-01-03 0009-01-02 23:58:45
{code}
Therefore, my conclusion would be that we cannot implement {{arrow::ymd()}}
binding as {{coalesce(strptime(x, format1), strptime(x, format2), ...)}}. What
do you think?
> [R] Implement lubridate's date/time parsing functions
> -----------------------------------------------------
>
> Key: ARROW-14471
> URL: https://issues.apache.org/jira/browse/ARROW-14471
> Project: Apache Arrow
> Issue Type: Sub-task
> Components: R
> Reporter: Nicola Crane
> Assignee: Dragoș Moldovan-Grünfeld
> Priority: Major
> Fix For: 8.0.0
>
>
> Parse dates with year, month, and day components:
> ymd() ydm() mdy() myd() dmy() dym() yq() ym() my()
>
> Parse date-times with year, month, and day, hour, minute, and second
> components:
> ymd_hms() ymd_hm() ymd_h() dmy_hms() dmy_hm() dmy_h() mdy_hms() mdy_hm()
> mdy_h() ydm_hms() ydm_hm() ydm_h()
> Parse periods with hour, minute, and second components:
> ms() hm() hms()
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)