On Saturday, 16 January 2016 at 15:25:50 UTC, Andrei Alexandrescu
wrote:
https://github.com/D-Programming-Language/phobos/pull/3934
So, say you're looking for the smallest 10 elements out of
100_000. The quickselect algorithm (which topN currently uses)
will successively partition the set in (assuming the pivot
choice works well) 50_000, 25_000, etc chunks all the way down
to finding the smallest 10.
That's quite a bit of work, so 3934 uses an alternate strategy
for finding the smallest 10:
1. Organize the first 11 elements into a max heap
2. Scan all other elements progressively. Whenever an element
is found that is smaller than the largest in the heap, swap it
with the largest in the heap then restore the heap property.
3. At the end, swap the largest in the heap with the 10th and
you're done!
This is very effective, and is seldom referred in the selection
literature. In fact, a more inefficient approach (heapify the
entire range) is discussed more often.
Destroy!
Andrei
A common way to do it is to go quicksort, but only recurse on one
side of the set. That should give log(n)^2 complexity on average.