Matthew Powers created SPARK-33995:
--------------------------------------

             Summary: Make datetime addition easier for years, weeks, hours, 
minutes, and seconds
                 Key: SPARK-33995
                 URL: https://issues.apache.org/jira/browse/SPARK-33995
             Project: Spark
          Issue Type: New Feature
          Components: SQL
    Affects Versions: 3.1.0
            Reporter: Matthew Powers


There are add_months and date_add functions that make it easy to perform 
datetime addition with months and days, but there isn't an easy way to perform 
datetime addition with years, weeks, hours, minutes, or seconds with the 
Scala/Python/R APIs.

Users need to write code like expr("first_datetime + INTERVAL 2 hours") to add 
two hours to a timestamp with the Scala API, which isn't desirable.  We don't 
want to make Scala users manipulate SQL strings.

We can expose the [make_interval SQL 
function|https://github.com/apache/spark/pull/26446/files] to make any 
combination of datetime addition possible.  That'll make tons of different 
datetime addition operations possible and will be valuable for a wide array of 
users.

make_interval takes 7 arguments: years, months, weeks, days, hours, mins, and 
secs.

There are different ways to expose the make_interval functionality to 
Scala/Python/R users:
 * Option 1: Single make_interval function that takes 7 arguments
 * Option 2: expose a few interval functions
 ** make_date_interval function that takes years, months, days
 ** make_time_interval function that takes hours, minutes, seconds
 ** make_datetime_interval function that takes years, months, days, hours, 
minutes, seconds
 * Option 3: expose add_years, add_months, add_days, add_weeks, add_hours, 
add_minutes, and add_seconds as Column methods.  
 * Option 4: Expose the add_years, add_hours, etc. as column functions.  
add_weeks and date_add have already been exposed in this manner.  

Option 1 is nice from a maintenance perspective cause it's a single function, 
but it's not standard from a user perspective.  Most languages support datetime 
instantiation with these arguments: years, months, days, hours, minutes, 
seconds.  Mixing weeks into the equation is not standard.

As a user, Option 3 would be my preference.  
col("first_datetime").addHours(2).addSeconds(30) is easy for me to remember and 
type.  col("first_datetime") + make_time_interval(lit(2), lit(0), lit(30)) 
isn't as nice.  col("first_datetime") + make_interval(lit(0), lit(0), lit(0), 
lit(0), lit(2), lit(0), lit(30)) is harder still.

Any of these options is an improvement to the status quo.  Let me know what 
option you think is best and then I'll make a PR to implement it, building off 
of Max's foundational work of course ;)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to