[ 
https://issues.apache.org/jira/browse/ARROW-13548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403354#comment-17403354
 ] 

Ian Cook commented on ARROW-13548:
----------------------------------

Note regarding the R bindings: I don't think any existing R functions map onto 
the functions added here, at least not in a straightforward way. The functions 
added here take a start timestamp and an end timestamp, and they compute how 
many time unit boundaries are crossed between the two timestamps.

In base R and various R packages like clock and lubridate, when the difference 
between two dates is computed, the result is typically a difftime / period / 
duration object that stores only the amount of time that elapsed. These objects 
_do not_ specify about when the period started. The functions that return the 
number of years/months/days/hours/etc. in a difftime / period / duration are 
unary and do not work by counting how many time unit boundaries were crossed.

However, lubridate also includes a class called 
[Interval|https://lubridate.tidyverse.org/reference/Interval-class.html] that 
_does_ store when the period started:
{quote}Interval is an S4 class that extends the Timespan class. An Interval 
object records one or more spans of time. Intervals record these timespans as a 
sequence of seconds that begin at a specified date. Since intervals are 
anchored to a precise moment of time, they can accurately be converted to 
Period or Duration class objects. This is because we can observe the length in 
seconds of each period that begins on a specific date. Contrast this to a 
generalized period, which may not have a consistent length in seconds (e.g. the 
number of seconds in a year will change if it is a leap year).
{quote}
However there are no functions in lubridate that take an Interval and return 
the number of time unit boundaries that were crossed in that Interval.

> [C++] Implement datediff kernel
> -------------------------------
>
>                 Key: ARROW-13548
>                 URL: https://issues.apache.org/jira/browse/ARROW-13548
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: David Li
>            Assignee: David Li
>            Priority: Major
>              Labels: compute, kernel, pull-request-available
>             Fix For: 6.0.0
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Add a kernel to compute the number of years, months, weeks, days, hours, 
> minutes, (micro/milli/nano)seconds, or quarters between two timestamps. 
> This should act like SQL's DATEDIFF ([SQL 
> Server|https://docs.microsoft.com/en-us/sql/t-sql/functions/datediff-transact-sql?view=sql-server-ver15]).
>  Pandas doesn't have a convenient equivalent except in the case of days 
> (pd.Timedelta.days) but it can be [calculated using 
> Timestamp.to_period|https://stackoverflow.com/questions/54171674/calculating-the-amount-of-full-months-between-two-dates].
> We have hinnant's date library vendored and this should hopefully be 
> implementable with that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to