Herman van Hovell tot Westerflier created SPARK-8816:
--------------------------------------------------------
Summary: Improve Performance of Unbounded Following Window Frame
Processing
Key: SPARK-8816
URL: https://issues.apache.org/jira/browse/SPARK-8816
Project: Spark
Issue Type: Sub-task
Components: SQL
Affects Versions: 1.4.0
Reporter: Herman van Hovell tot Westerflier
Priority: Minor
The performance of Unbounded Following frames in both the current and the
proposed (SPARK-8638) implementation of Window Functions is quite bad:
O(N*(N-1)/2).
A solution to this is to process such frames in reverse. This would effectively
reduce the complexity to O(N). The problem with this approach that it assumes
that AggregateExpression are communitative. Most are, but some are actually
order based: FIRST/LAST. There are two solution for this:
* Only allow communitative aggregates to processed in reverse order. In
practice this would mean, that a white list containing all allowed aggregates
is used, e.g. Sum, Average, Min, Max, ...
* Add functionality to WindowFunction or even better AggregateExpression, which
would allow us to get the reverse operator from the expression, and use this
reverse for processing, for example:
{noformat}
case class Max(child: Expression) extends AggregateExpression with Ordered {
...
def reverse: Min = Min(child)
}
{noformat}
The impact and extensibility of the first option are lower than of the second.
We might also want to asses how often such frames are used. It might not be a
problem at all.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]