xubo245 created SPARK-18275:
-------------------------------
Summary: Why does not use an ordered queue in takeOrdered?
Key: SPARK-18275
URL: https://issues.apache.org/jira/browse/SPARK-18275
Project: Spark
Issue Type: Question
Components: Spark Core
Affects Versions: 2.0.1
Reporter: xubo245
Priority: Minor
Every partition in mapRDDs is defined as BoundedPriorityQueue object :
val queue = new BoundedPriorityQueue[T](num)(ord.reverse)
in org.apache.spark.rdd.RDD#takeOrdered
After mapRDDs.reduce,only return one queue and the queue is
BoundedPriorityQueue ,so after toArray , Is it necessary to use sorted in
takeOrdered? If we can keep the queue is ordered,we can only use reverse
The same as in org.apache.spark.util.collection.Utils#takeOrdered, the leastOf
method also use a unordered buffer ,why does not use a ordered queue? we can
insert a num in O(log k) ,but the traditional quickselect algorithm take O(k)
time. Also we do not need a sort after selecting and save O(k * log k)
the leastOf is call
com.google.common.collect.Ordering#leastOf(java.util.Iterator<E>, int)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]