[jira] [Created] (SPARK-2983) improve performance of sortByKey()

Davies Liu (JIRA) Mon, 11 Aug 2014 23:07:27 -0700

Davies Liu created SPARK-2983:
---------------------------------

             Summary: improve performance of sortByKey()
                 Key: SPARK-2983
                 URL: https://issues.apache.org/jira/browse/SPARK-2983
             Project: Spark
          Issue Type: Improvement
          Components: PySpark
    Affects Versions: 1.0.2, 0.9.0, 1.1.0
            Reporter: Davies Liu



For large datasets with many partitions (N), sortByKey() will be very slow, 
because it will take O(N) time in rangePartitioner.

This could be improved by using binary search, the time will be reduced to 
O(logN).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-2983) improve performance of sortByKey()

Reply via email to