[ https://issues.apache.org/jira/browse/SPARK-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matei Zaharia resolved SPARK-2983. ---------------------------------- Resolution: Fixed Fix Version/s: 1.1.0 > improve performance of sortByKey() > ---------------------------------- > > Key: SPARK-2983 > URL: https://issues.apache.org/jira/browse/SPARK-2983 > Project: Spark > Issue Type: Improvement > Components: PySpark > Affects Versions: 0.9.0, 1.1.0, 1.0.2 > Reporter: Davies Liu > Assignee: Davies Liu > Fix For: 1.1.0 > > > For large datasets with many partitions (N), sortByKey() will be very slow, > because it will take O(N) time in rangePartitioner. > This could be improved by using binary search, the time will be reduced to > O(logN). -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org