jorisvandenbossche commented on issue #33962:
URL: https://github.com/apache/arrow/issues/33962#issuecomment-1864285994

   We might need some more discussion about what we actually want here. The 
current PR adds "day", "second", "milli/micro/nanosecond" and "subsecond" 
kernels. And I think this is mostly modelled after the Python 
`datetime.timedelta` attributes (see also 
https://pandas.pydata.org/docs/user_guide/timedeltas.html#attributes for some 
context). 
   
   For example the "second" kernel in the PR would return the number of seconds 
in the duration value that represents the number of seconds >= 0 and < 1 day. 
Equivalent Python example:
   
   ```python
   >>> import datetime
   >>> td = datetime.timedelta(days=2, hours=3, seconds=4, milliseconds=5)
   >>> td.seconds
   10804
   # which is 3 hours (60*60 seconds) + 4 seconds
   >>> 3*3600+4
   10804
   ```
   
   But a reason for Python to have those attributes, is because that is how it 
is implemented under the hood (it stores separate numbers of days, seconds and 
microseconds 
(https://docs.python.org/3/library/datetime.html#timedelta-objects). 
   In Arrow, we simply store a single value (number of 
(milli/micro/nano)seconds depending on the unit), so it doesn't necessarily 
make sense to copy the interface of Python's `datetime.timedelta` to extract 
those components (for example, why days and seconds, and not also hours?). Also 
note that the Python attributes are plural, in contrast to the names for the 
timestamp/date/time parts.
   
   Checking with some other software about what kind of operations are support 
for Duration types:
   
   - Python's `datetime.timedelta` has an additional method `total_seconds()`, 
which always returns all seconds as a float (in the example above, 
`td.total_seconds()` returns 183604.005). 
     This could be useful to add as an easier way to get the duration in 
seconds, regardless of the unit (you can already achieve this currently by 
dividing by a duration of 1 second).
   - As mentioned earlier in this thread, pandas has an additional `components` 
attribute, that gives you the different components as they would be 
_displayed_, i.e. actually splitted in days/hours/minutes/seconds/milli...)
   - The R lubridate package doesn't seem to have specific methods for its 
duration type for this type of operations 
(https://lubridate.tidyverse.org/reference/index.html#durations)
   - The Joda-Time Java package has 
`getStandardDays`/`getStandardHours`/`getStandardMinutes`/`getStandardSeconds` 
methods (https://www.joda.org/joda-time/key_duration.html, 
https://www.joda.org/joda-time/apidocs/org/joda/time/Duration.html). But in 
this case, they are not "mutually exclusive", i.e. the seconds still include 
the days/hours/minutes as well.
   - The Rust chrono crate has a Duration type with 
`num_days`/`num_hours`/`num_minutes`/.. etc methods 
(https://docs.rs/chrono/latest/chrono/struct.Duration.html), but again they 
return the total number of days/hours/minutes/seconds/.., and not e.g. the 
number of hours after the number of days already has been subtracted (i.e. the 
number of days is simply "number of seconds / seconds_per_day")
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to