[
https://issues.apache.org/jira/browse/ARROW-13548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403354#comment-17403354
]
Ian Cook commented on ARROW-13548:
----------------------------------
Note regarding the R bindings: I don't think any existing R functions map onto
the functions added here, at least not in a straightforward way. The functions
added here take a start timestamp and an end timestamp, and they compute how
many time unit boundaries are crossed between the two timestamps.
In base R and various R packages like clock and lubridate, when the difference
between two dates is computed, the result is typically a difftime / period /
duration object that stores only the amount of time that elapsed. These objects
_do not_ specify about when the period started. The functions that return the
number of years/months/days/hours/etc. in a difftime / period / duration are
unary and do not work by counting how many time unit boundaries were crossed.
However, lubridate also includes a class called
[Interval|https://lubridate.tidyverse.org/reference/Interval-class.html] that
_does_ store when the period started:
{quote}Interval is an S4 class that extends the Timespan class. An Interval
object records one or more spans of time. Intervals record these timespans as a
sequence of seconds that begin at a specified date. Since intervals are
anchored to a precise moment of time, they can accurately be converted to
Period or Duration class objects. This is because we can observe the length in
seconds of each period that begins on a specific date. Contrast this to a
generalized period, which may not have a consistent length in seconds (e.g. the
number of seconds in a year will change if it is a leap year).
{quote}
However there are no functions in lubridate that take an Interval and return
the number of time unit boundaries that were crossed in that Interval.
> [C++] Implement datediff kernel
> -------------------------------
>
> Key: ARROW-13548
> URL: https://issues.apache.org/jira/browse/ARROW-13548
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: David Li
> Assignee: David Li
> Priority: Major
> Labels: compute, kernel, pull-request-available
> Fix For: 6.0.0
>
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
> Add a kernel to compute the number of years, months, weeks, days, hours,
> minutes, (micro/milli/nano)seconds, or quarters between two timestamps.
> This should act like SQL's DATEDIFF ([SQL
> Server|https://docs.microsoft.com/en-us/sql/t-sql/functions/datediff-transact-sql?view=sql-server-ver15]).
> Pandas doesn't have a convenient equivalent except in the case of days
> (pd.Timedelta.days) but it can be [calculated using
> Timestamp.to_period|https://stackoverflow.com/questions/54171674/calculating-the-amount-of-full-months-between-two-dates].
> We have hinnant's date library vendored and this should hopefully be
> implementable with that.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)