xubo245 created SPARK-18275:
-------------------------------

             Summary: Why does not use an ordered queue in takeOrdered?
                 Key: SPARK-18275
                 URL: https://issues.apache.org/jira/browse/SPARK-18275
             Project: Spark
          Issue Type: Question
          Components: Spark Core
    Affects Versions: 2.0.1
            Reporter: xubo245
            Priority: Minor


Every partition in mapRDDs is defined as BoundedPriorityQueue object :

val queue = new BoundedPriorityQueue[T](num)(ord.reverse)
in org.apache.spark.rdd.RDD#takeOrdered

After mapRDDs.reduce,only return one queue and the queue is 
BoundedPriorityQueue ,so  after toArray , Is it necessary to use sorted in 
takeOrdered? If we can keep the queue is ordered,we can only use reverse

The same as in org.apache.spark.util.collection.Utils#takeOrdered, the leastOf 
method also use a unordered buffer ,why does not use a ordered queue? we can 
insert a num in O(log k) ,but the traditional quickselect algorithm take  O(k) 
time. Also we do not need a sort after selecting and  save  O(k * log k)

the leastOf is call 
com.google.common.collect.Ordering#leastOf(java.util.Iterator<E>, int)






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to