[
https://issues.apache.org/jira/browse/ARROW-14846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494516#comment-17494516
]
Alenka Frim commented on ARROW-14846:
-------------------------------------
Looking into Lubridate _stamp()_ function I would like to share some research
and ask for ideas.
As far as I can see the function has two main parts [I am simplifying]:
{code:r}
> D <- c(ymd("2022-04-05"), ymd("2023-04-05"), ymd("2024-04-05"))
> example_string <- "Arrow bbq party on Friday in the year 2022! =)"
> stamp(example_string)(D)
Multiple formats matched: "Arrow bbq party on %A in the year %H%M! =)"(1),
"Arrow bbq party on %A in the year %M%S! =)"(1), "Arrow bbq party on %A in the
year %Y! =)"(1), "Arrow bbq party on Friday in the year %H%M! =)"(1), "Arrow
bbq party on Friday in the year %M%S! =)"(1), "Arrow bbq party on Friday in the
year %Y! =)"(1)
Using: "Arrow bbq party on %A in the year %Y! =)"
[1] "Arrow bbq party on Tuesday in the year 2022! =)" "Arrow bbq party on
Wednesday in the year 2023! =)"
[3] "Arrow bbq party on Friday in the year 2024! =)"
{code}
*1.* Finding all possible date formats from a random string and selecting the
most probable one. It uses _train_formats()_ and _guess_formats()_ functions,
among others.
{code:r}
> unique(guess_formats(example_string, orders, locale))
[1] "Arrow bbq party on %A in the year %H%M! =)" "Arrow bbq party on %A in
the year %M%S! =)"
[3] "Arrow bbq party on %A in the year %Y! =)" "Arrow bbq party on Friday
in the year %H%M! =)"
[5] "Arrow bbq party on Friday in the year %M%S! =)" "Arrow bbq party on Friday
in the year %Y! =)"
{code}
*2.* Using _format()_ to insert the date information into an example string
using the selected format from point 1.
{code:r}
> possible_formats <- unique(guess_formats(example_string, orders, locale))
> format(D, possible_formats[3])
[1] "Arrow bbq party on Tuesday in the year 2022! =)" "Arrow bbq party on
Wednesday in the year 2023! =)"
[3] "Arrow bbq party on Friday in the year 2024! =)"
{code}
Both of these steps would have to be implemented to be able to define the stamp
function (there are no existing Arrow functions that I could use, if I am not
mistaken). And none of these two are trivial to implement on the R side and not
needed on the C++ side.
In Python I can’t find similar function to _guess_formats()_.
> [R] Bindings for lubridate's stamp, stamp_date, and stamp_time
> --------------------------------------------------------------
>
> Key: ARROW-14846
> URL: https://issues.apache.org/jira/browse/ARROW-14846
> Project: Apache Arrow
> Issue Type: Sub-task
> Components: R
> Reporter: Nicola Crane
> Assignee: Alenka Frim
> Priority: Major
> Labels: good-first-issue
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)