[
https://issues.apache.org/jira/browse/SPARK-20207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956403#comment-15956403
]
Mathew Wicks commented on SPARK-20207:
--------------------------------------
A toy example is given in the StackOverflow.
Although an alternative solution would be to implement array concatenation.
Because, for most aggregations, you can split the calculation into the 'before
current row' and 'after current row' partitions (think SUM()), but functions
like COLLECT_LIST(), this is not possible.
There is precedent for array concatenation in SQL, for example ARRAY_CONCAT()
in BigQuery or ARRAY_CAT() in PostgresQL.
http://www.w3resource.com/PostgreSQL/postgresql_array_cat-function.php
> Add ablity to exclude current row in WindowSpec
> -----------------------------------------------
>
> Key: SPARK-20207
> URL: https://issues.apache.org/jira/browse/SPARK-20207
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 2.1.0
> Reporter: Mathew Wicks
> Priority: Minor
>
> It would be useful if we could implement a way to exclude the current row in
> WindowSpec. (We can currently only select ranges of rows/time.)
> Currently, users have to resort to ridiculous measures to exclude the current
> row from windowing aggregations.
> As seen here:
> http://stackoverflow.com/questions/43180723/spark-sql-excluding-the-current-row-in-partition-by-windowing-functions/43198839#43198839
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]