alamb opened a new issue, #7068:
URL: https://github.com/apache/arrow-datafusion/issues/7068

   ### Is your feature request related to a problem or challenge?
   
   Intervals (e.g. `1 month`, which is a different number of days depending on 
the relative dates) and Durations (e.g. `12321 seconds`) and  arithmetic have 
(subltely) different semantics
   
   SQL exposes intervals (not Durations), as the `INTERVAL` type
   
   DataFusion has historically used intervals inconsistently
   
   For example, subtracting timestamps that are one year apart results in an 
interval of 365 days, not 1 year
   
   ```sql
   ❯ select '2022-01-01T12:34:45'::timestamp - '2021-01-01T12:34:45'::timestamp;
   +-----------------------------------------------------------+
   | Utf8("2022-01-01T12:34:45") - Utf8("2021-01-01T12:34:45") |
   +-----------------------------------------------------------+
   | 0 years 0 mons 365 days 0 hours 0 mins 0.000000000 secs   |
   +-----------------------------------------------------------+
   1 row in set. Query took 0.002 seconds.
   ```
   
   Subtracting 11 months results in an interval of 334 days, rather than 11 
months.
   ```
   ❯ select '2022-01-01T12:34:45'::timestamp - '2021-02-01T12:34:45'::timestamp;
   +-----------------------------------------------------------+
   | Utf8("2022-01-01T12:34:45") - Utf8("2021-02-01T12:34:45") |
   +-----------------------------------------------------------+
   | 0 years 0 mons 334 days 0 hours 0 mins 0.000000000 secs   |
   +-----------------------------------------------------------+
   1 row in set. Query took 0.000 seconds.
   ```
   
   It appears that the current code assumes a day is 24 hours (which is not 
always correct given leap seconds, and other date/time oddities):
   ```
   ❯ select '2022-01-01T12:34:45'::timestamp - 
'2021-01-01T12:34:45.0432'::timestamp;
   +----------------------------------------------------------------+
   | Utf8("2022-01-01T12:34:45") - Utf8("2021-01-01T12:34:45.0432") |
   +----------------------------------------------------------------+
   | 0 years 0 mons 364 days 23 hours 59 mins 59.956800000 secs     |
   +----------------------------------------------------------------+
   1 row in set. Query took 0.000 seconds.
   ```
   
   
   ### Describe the solution you'd like
   
   To avoid inconsistencies and make sure the semantics are clear, @tustvold 
proposes to change timestamp / timestamp arithmetic to use Durations 
consistently. This will mean subtracting two timestamps will result in an 
actual `Duration` array rather than an `Interval` which really encodes a 
duration
   
   ### Describe alternatives you've considered
   
   Leave the existing system as is
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to