[
https://issues.apache.org/jira/browse/SPARK-25841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16666559#comment-16666559
]
Reynold Xin commented on SPARK-25841:
-------------------------------------
I posted api proposal sketches in
https://issues.apache.org/jira/browse/SPARK-25843
> Redesign window function rangeBetween API
> -----------------------------------------
>
> Key: SPARK-25841
> URL: https://issues.apache.org/jira/browse/SPARK-25841
> Project: Spark
> Issue Type: Umbrella
> Components: SQL
> Affects Versions: 2.3.2, 2.4.0
> Reporter: Reynold Xin
> Assignee: Reynold Xin
> Priority: Major
>
> As I was reviewing the Spark API changes for 2.4, I found that through
> organic, ad-hoc evolution the current API for window functions in Scala is
> pretty bad.
>
> To illustrate the problem, we have two rangeBetween functions in Window
> class:
>
> {code:java}
> class Window {
> def unboundedPreceding: Long
> ...
> def rangeBetween(start: Long, end: Long): WindowSpec
> def rangeBetween(start: Column, end: Column): WindowSpec
> }{code}
>
> The Column version of rangeBetween was added in Spark 2.3 because the
> previous version (Long) could only support integral values and not time
> intervals. Now in order to support specifying unboundedPreceding in the
> rangeBetween(Column, Column) API, we added an unboundedPreceding that returns
> a Column in functions.scala.
>
> There are a few issues I have with the API:
>
> 1. To the end user, this can be just super confusing. Why are there two
> unboundedPreceding functions, in different classes, that are named the same
> but return different types?
>
> 2. Using Column as the parameter signature implies this can be an actual
> Column, but in practice rangeBetween can only accept literal values.
>
> 3. We added the new APIs to support intervals, but they don't actually work,
> because in the implementation we try to validate the start is less than the
> end, but calendar interval types are not comparable, and as a result we throw
> a type mismatch exception at runtime: scala.MatchError: CalendarIntervalType
> (of class org.apache.spark.sql.types.CalendarIntervalType$)
>
> 4. In order to make interval work, users need to create an interval using
> CalendarInterval, which is an internal class that has no documentation and no
> stable API.
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]