Re: Worst-case performance of quickSort / getPivot

Chris Cain Sat, 16 Nov 2013 21:26:51 -0800

On Sunday, 17 November 2013 at 04:14:22 UTC, Andrei Alexandrescuwrote:

Probably a good step forward would be to hook a sortbenchmarking corpus to our unittests. What are the consecratedcorpora?


I'll kick off the suggestions with this:


File to provide "real" data:
ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_K_12_substr__MG1655_uid57779/NC_000913.fna

From http://www.genome.jp/kegg-bin/show_organism?org=eco ->RefSeq -> NC_000913.fna

(despite this being real data, typically you don't actually sortgenomes like this ... it's simply "more interesting" than apseudo-randomized dataset)


Code:

---

import std.range, std.algorithm, std.stdio, std.array, std.conv,std.datetime;


enum strategy = SwapStrategy.stable;

void main() {
    auto arr = File("NC_000913.fna").byLine

.drop(1) // Trim the first line because it's justmetadata

            /*.take(50)*/
            .join
            .chunks(50)
            .map!text
            .array;
    StopWatch sw = StopWatch(AutoStart.yes);
    arr.sort!("a < b", strategy)();
    sw.stop();

writeln(text(arr.length, " elements sorted in ",sw.peek.msecs, " msecs"));

}
---

On my system, this shows unstable sort to be ~2x faster thanstable sort (which is to be expected of data with no discerniblepatterns).

The "key" thing this is testing is the effectiveness of the sorton data that often takes longer to compare (with such a smallalphabet, many strings have a common prefix).

That said, it might also be reproduced "well enough" using arandom generator to create similar strings to sort, but the basicidea is there. I just like using real genomes for performancetesting things :)

Re: Worst-case performance of quickSort / getPivot

Reply via email to