[
https://issues.apache.org/jira/browse/SPARK-11481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Davies Liu resolved SPARK-11481.
--------------------------------
Resolution: Fixed
Assignee: Davies Liu
Fix Version/s: 1.6.0
1.5.2
Re-open this if it's different.
> orderBy with multiple columns in WindowSpec does not work properly
> ------------------------------------------------------------------
>
> Key: SPARK-11481
> URL: https://issues.apache.org/jira/browse/SPARK-11481
> Project: Spark
> Issue Type: Bug
> Components: PySpark, SQL
> Affects Versions: 1.5.1
> Environment: All
> Reporter: Jose Antonio
> Assignee: Davies Liu
> Labels: DataFrame, sparkSQL
> Fix For: 1.5.2, 1.6.0
>
>
> When using multiple columns in the orderBy of a WindowSpec the order by seems
> to work only for the first column.
> A possible workaround is to sort previosly the DataFrame and then apply the
> window spec over the sorted DataFrame
> e.g.
> THIS NOT WORKS:
> window_sum = Window.partitionBy('user_unique_id').orderBy('creation_date',
> 'mib_id', 'day').rowsBetween(-sys.maxsize, 0)
> df = df.withColumn('user_version',
> func.sum(df.group_counter).over(window_sum))
> THIS WORKS WELL:
> df = df.sort('user_unique_id', 'creation_date', 'mib_id', 'day')
> window_sum = Window.partitionBy('user_unique_id').orderBy('creation_date',
> 'mib_id', 'day').rowsBetween(-sys.maxsize, 0)
> df = df.withColumn('user_version',
> func.sum(df.group_counter).over(window_sum))
> Also, can anybody confirm that this is a true workaround?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]