[
https://issues.apache.org/jira/browse/SPARK-18275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen resolved SPARK-18275.
-------------------------------
Resolution: Not A Problem
Questions should go to user@
Priority queues are not necessarily sorted and in fact are generally not sorted
because they use a heap representation internally. It's not true that inserts
are O(log k) if you are maintaining a sorted data structure.
> Why does not use an ordered queue in takeOrdered?
> -------------------------------------------------
>
> Key: SPARK-18275
> URL: https://issues.apache.org/jira/browse/SPARK-18275
> Project: Spark
> Issue Type: Question
> Components: Spark Core
> Affects Versions: 2.0.1
> Reporter: xubo245
> Priority: Minor
>
> Every partition in mapRDDs is defined as BoundedPriorityQueue object :
> val queue = new BoundedPriorityQueue[T](num)(ord.reverse)
> in org.apache.spark.rdd.RDD#takeOrdered
> After mapRDDs.reduce,only return one queue and the queue is
> BoundedPriorityQueue ,so after toArray , Is it necessary to use sorted in
> takeOrdered? If we can keep the queue is ordered,we can only use reverse
> The same as in org.apache.spark.util.collection.Utils#takeOrdered, the
> leastOf method also use a unordered buffer ,why does not use a ordered queue?
> we can insert a num in O(log k) ,but the traditional quickselect algorithm
> take O(k) time. Also we do not need a sort after selecting and save O(k *
> log k)
> the leastOf is call
> com.google.common.collect.Ordering#leastOf(java.util.Iterator<E>, int)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]