Dragoș Moldovan-Grünfeld created ARROW-16596:
------------------------------------------------
Summary: [C++] Add option to control the cutoff between 1900 and
2000 when %y
Key: ARROW-16596
URL: https://issues.apache.org/jira/browse/ARROW-16596
Project: Apache Arrow
Issue Type: Improvement
Components: C++, R
Affects Versions: 8.0.0
Reporter: Dragoș Moldovan-Grünfeld
When parsing to datetime a string with year in the short format ({{{}%y{}}}),
it would be great if we could have control over the cutoff point between 1900
and 2000. Currently it is implicitly set to 68:
{code:r}
library(arrow, warn.conflicts = FALSE)
a <- Array$create(c("68-05-17", "69-05-17"))
call_function("strptime", a, options = list(format = "%y-%m-%d", unit = 0L))
#> Array
#> <timestamp[s]>
#> [
#> 2068-05-17 00:00:00,
#> 1969-05-17 00:00:00
#> ]
{code}
For example, lubridate has names this argument {{cutoff_2000}} argument (e.g.
for {{{}fast_strptime{}}}. This works as follows:
{code:r}
library(lubridate, warn.conflicts = FALSE)
dates_vector <- c("68-05-17", "69-05-17", "55-05-17")
fast_strptime(dates_vector, format = "%y-%m-%d")
#> [1] "2068-05-17 UTC" "1969-05-17 UTC" "2055-05-17 UTC"
fast_strptime(dates_vector, format = "%y-%m-%d", cutoff_2000 = 50)
#> [1] "1968-05-17 UTC" "1969-05-17 UTC" "1955-05-17 UTC"
fast_strptime(dates_vector, format = "%y-%m-%d", cutoff_2000 = 70)
#> [1] "2068-05-17 UTC" "2069-05-17 UTC" "2055-05-17 UTC"
{code}
In the {{lubridate::fast_strptime()}} documentation it is described as follows:
{quote}
cutoff_2000
integer. For y format, two-digit numbers smaller or equal to cutoff_2000 are
parsed as though starting with 20, otherwise parsed as though starting with 19.
Available only for functions relying on lubridates internal parser.
{quote}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)