Re: topN using a heap

Ivan Kazmenko via Digitalmars-d Mon, 18 Jan 2016 06:26:22 -0800

On Monday, 18 January 2016 at 12:00:10 UTC, Ivan Kazmenko wrote:

On Sunday, 17 January 2016 at 22:20:30 UTC, Andrei Alexandrescuwrote:

All - let me know how things can be further improved. Thx!

Here goes the test which shows quadratic behavior for the newversion:

http://dpaste.dzfl.pl/e4b3bc26c3cf
(dpaste kills the slow code before it completes the task)

The inspiration is the paper "A Killer Adversary for Quicksort":
http://www.cs.dartmouth.edu/~doug/mdmspe.pdf
(I already mentioned it on the forums a while ago)

Ivan Kazmenko.


Perhaps I should include a textual summary as well.

The code on DPaste starts by constructing an array of Elements ofsize MAX_N; in the code, MAX_N is 50_000. After that, we run thevictim function on our array. Here, the victim is topN (array,MAX_N / 2); it could be sort (array) or something else.

An Element contains, or rather, pretends to contain, an intvalue. Here is how Element is special: the result of comparisonfor two Elements is decided on-the-fly. An Element can be eitherUNDECIDED or have a fixed value. Initially, all elements areUNDECIDED. When we compare two Elements and at least one of themhas a fixed value, the comparison is resolved naturally, andUNDECIDED element is greater than any fixed element. When wecompare two UNDECIDED elements, the one which participated morein the comparisons so far gets a fixed value: greater than anyother value fixed so far, but still less than UNDECIDED. Thisway, the results of old comparisons are consistent with the newfixed value.

Now, what do we achieve by running the victim function? Turnsout that the algorithms using the idea of QuickSort orQuickSelect tend to make most comparisons against their currentpivot value. Our Element responds to that by fixing the pivot toone of the lowest available values. After that, a partitionusing such pivot will have only few, O(1), elements before thepivot, and the rest after the pivot. In total, this will lead toquadratic performance.

After running the victim function on our array of Elements (which- careful here - already takes quadratic time), we reorder themin their original order (to do that, each Element also stores itsoriginal index).

Now, we can re-run the algorithm on the array obtained so far.If the victim function is (strongly) pure, it will inevitablymake the same comparisons in the same order. The only differenceis that their result will already be decided.

Alternatively, we can make an int array of the current values inour array of Elements (also in their original order). Runningthe victim function on the int array must also make the same(quadratic number of) comparisons in the same order.


Ivan Kazmenko.

Re: topN using a heap

Reply via email to