[ 
https://issues.apache.org/jira/browse/ARROW-14846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494516#comment-17494516
 ] 

Alenka Frim commented on ARROW-14846:
-------------------------------------

Looking into Lubridate _stamp()_ function I would like to share some research 
and ask for ideas.

As far as I can see the function has two main parts [I am simplifying]:
{code:r}
> D <- c(ymd("2022-04-05"), ymd("2023-04-05"), ymd("2024-04-05"))
> example_string <- "Arrow bbq party on Friday in the year 2022! =)"

> stamp(example_string)(D)
Multiple formats matched: "Arrow bbq party on %A in the year %H%M! =)"(1), 
"Arrow bbq party on %A in the year %M%S! =)"(1), "Arrow bbq party on %A in the 
year %Y! =)"(1), "Arrow bbq party on Friday in the year %H%M! =)"(1), "Arrow 
bbq party on Friday in the year %M%S! =)"(1), "Arrow bbq party on Friday in the 
year %Y! =)"(1)
Using: "Arrow bbq party on %A in the year %Y! =)"
[1] "Arrow bbq party on Tuesday in the year 2022! =)"   "Arrow bbq party on 
Wednesday in the year 2023! =)"
[3] "Arrow bbq party on Friday in the year 2024! =)"  
{code}
*1.* Finding all possible date formats from a random string and selecting the 
most probable one. It uses _train_formats()_ and _guess_formats()_ functions, 
among others.

{code:r}
> unique(guess_formats(example_string, orders, locale))
[1] "Arrow bbq party on %A in the year %H%M! =)"     "Arrow bbq party on %A in 
the year %M%S! =)"    
[3] "Arrow bbq party on %A in the year %Y! =)"       "Arrow bbq party on Friday 
in the year %H%M! =)"
[5] "Arrow bbq party on Friday in the year %M%S! =)" "Arrow bbq party on Friday 
in the year %Y! =)"  
{code}
*2.* Using _format()_ to insert the date information into an example string 
using the selected format from point 1.

{code:r}
> possible_formats <- unique(guess_formats(example_string, orders, locale))

> format(D, possible_formats[3])
[1] "Arrow bbq party on Tuesday in the year 2022! =)"   "Arrow bbq party on 
Wednesday in the year 2023! =)"
[3] "Arrow bbq party on Friday in the year 2024! =)" 
{code}
Both of these steps would have to be implemented to be able to define the stamp 
function (there are no existing Arrow functions that I could use, if I am not 
mistaken). And none of these two are trivial to implement on the R side and not 
needed on the C++ side.

In Python I can’t find similar function to _guess_formats()_.

> [R] Bindings for lubridate's stamp, stamp_date, and stamp_time
> --------------------------------------------------------------
>
>                 Key: ARROW-14846
>                 URL: https://issues.apache.org/jira/browse/ARROW-14846
>             Project: Apache Arrow
>          Issue Type: Sub-task
>          Components: R
>            Reporter: Nicola Crane
>            Assignee: Alenka Frim
>            Priority: Major
>              Labels: good-first-issue
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to