Davies Liu created SPARK-2983:
---------------------------------

             Summary: improve performance of sortByKey()
                 Key: SPARK-2983
                 URL: https://issues.apache.org/jira/browse/SPARK-2983
             Project: Spark
          Issue Type: Improvement
          Components: PySpark
    Affects Versions: 1.0.2, 0.9.0, 1.1.0
            Reporter: Davies Liu


For large datasets with many partitions (N), sortByKey() will be very slow, 
because it will take O(N) time in rangePartitioner.

This could be improved by using binary search, the time will be reduced to 
O(logN).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to