Matthew Powers created SPARK-33995:
--------------------------------------
Summary: Make datetime addition easier for years, weeks, hours,
minutes, and seconds
Key: SPARK-33995
URL: https://issues.apache.org/jira/browse/SPARK-33995
Project: Spark
Issue Type: New Feature
Components: SQL
Affects Versions: 3.1.0
Reporter: Matthew Powers
There are add_months and date_add functions that make it easy to perform
datetime addition with months and days, but there isn't an easy way to perform
datetime addition with years, weeks, hours, minutes, or seconds with the
Scala/Python/R APIs.
Users need to write code like expr("first_datetime + INTERVAL 2 hours") to add
two hours to a timestamp with the Scala API, which isn't desirable. We don't
want to make Scala users manipulate SQL strings.
We can expose the [make_interval SQL
function|https://github.com/apache/spark/pull/26446/files] to make any
combination of datetime addition possible. That'll make tons of different
datetime addition operations possible and will be valuable for a wide array of
users.
make_interval takes 7 arguments: years, months, weeks, days, hours, mins, and
secs.
There are different ways to expose the make_interval functionality to
Scala/Python/R users:
* Option 1: Single make_interval function that takes 7 arguments
* Option 2: expose a few interval functions
** make_date_interval function that takes years, months, days
** make_time_interval function that takes hours, minutes, seconds
** make_datetime_interval function that takes years, months, days, hours,
minutes, seconds
* Option 3: expose add_years, add_months, add_days, add_weeks, add_hours,
add_minutes, and add_seconds as Column methods.
* Option 4: Expose the add_years, add_hours, etc. as column functions.
add_weeks and date_add have already been exposed in this manner.
Option 1 is nice from a maintenance perspective cause it's a single function,
but it's not standard from a user perspective. Most languages support datetime
instantiation with these arguments: years, months, days, hours, minutes,
seconds. Mixing weeks into the equation is not standard.
As a user, Option 3 would be my preference.
col("first_datetime").addHours(2).addSeconds(30) is easy for me to remember and
type. col("first_datetime") + make_time_interval(lit(2), lit(0), lit(30))
isn't as nice. col("first_datetime") + make_interval(lit(0), lit(0), lit(0),
lit(0), lit(2), lit(0), lit(30)) is harder still.
Any of these options is an improvement to the status quo. Let me know what
option you think is best and then I'll make a PR to implement it, building off
of Max's foundational work of course ;)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]