On 09/23/2016 04:45 PM, Jon Degenhardt wrote:
That's a very nice result. Both the absolute numbers and that all three
sets achieve similar performance, as they different distribution
characteristics.

Got curious so I tried to patch my ldc installation (0.17.1 (DMD v2.068.2, LLVM 3.8.0)), but sadly that didn't work out because sorting.d uses the new syntax in various places.

Probably the best baseline is the equivalent C++ code using the mature implementation of nth_element, see http://paste.ofcode.org/ieZPdchjzTXbDcG3LvsYBP (also pasted at the end of this message). Here's what I got: http://paste.ofcode.org/6feyBLRiJtieHdZ3bmGaUW, see also text below.

The topN code produced with dmd is faster in virtually all cases (has parity for -f 4, which I suspect is a degenerate case with most elements equal, which exercises only a small part of the algorithm). For C++ I used -O3 -DNDEBUG, any other flags I should use?

As an aside, I enjoy poking fun at the stunning inefficiency of iostreams in my public talks; they delivered again :o).


Andrei

=========================== shell
$ g++ -O3 -DNDEBUG -omedian_by_sort_cpp issue16517.cpp
$ g++ -O3 -DNDEBUG -DTOPN -omedian_by_topn_cpp issue16517.cpp
$ dmd -O -inline -release -ofmedian_by_sort -boundscheck=off issue16517.d $ dmd -O -inline -release -version=topn -ofmedian_by_topn -boundscheck=off issue16517.d $ cut -f 2 /tmp/googlebooks-eng-all-1gram-20120701-0 | ./median_by_sort_cpp median (via sort): 1972; lines: 10512769; total time (ms): 3419.11; sort time (ms): 314.086
$ cut -f 2  /tmp/googlebooks-eng-all-1gram-20120701-0 | ./median_by_sort
median (via sort): 1972; lines: 10512769; total time (ms): 1462; sort time (ms): 399
$ cut -f 2  /tmp/googlebooks-eng-all-1gram-20120701-0 | ./median_by_topn_cpp
median (via nth_element): 1972; lines: 10512769; total time (ms): 3255.41; nth_element time (ms): 40.676
$ cut -f 2  /tmp/googlebooks-eng-all-1gram-20120701-0 | ./median_by_topn
median (via topN): 1972; lines: 10512769; total time (ms): 1211; topN time (ms): 32
$ cut -f 3  /tmp/googlebooks-eng-all-1gram-20120701-0 | ./median_by_sort_cpp
median (via sort): 3; lines: 10512769; total time (ms): 2949.11; sort time (ms): 297.814
$ cut -f 3  /tmp/googlebooks-eng-all-1gram-20120701-0 | ./median_by_sort
median (via sort): 3; lines: 10512769; total time (ms): 1351; sort time (ms): 382
$ cut -f 3  /tmp/googlebooks-eng-all-1gram-20120701-0 | ./median_by_topn_cpp
median (via nth_element): 3; lines: 10512769; total time (ms): 2316.75; nth_element time (ms): 65.041
$ cut -f 3  /tmp/googlebooks-eng-all-1gram-20120701-0 | ./median_by_topn
median (via topN): 3; lines: 10512769; total time (ms): 973; topN time (ms): 59
$ cut -f 4  /tmp/googlebooks-eng-all-1gram-20120701-0 | ./median_by_sort_cpp
median (via sort): 2; lines: 10512769; total time (ms): 2614.47; sort time (ms): 351.285
$ cut -f 4  /tmp/googlebooks-eng-all-1gram-20120701-0 | ./median_by_sort
median (via sort): 2; lines: 10512769; total time (ms): 1269; sort time (ms): 361
$ cut -f 4  /tmp/googlebooks-eng-all-1gram-20120701-0 | ./median_by_topn_cpp
median (via nth_element): 2; lines: 10512769; total time (ms): 2443.17; nth_element time (ms): 76.211
$ cut -f 4  /tmp/googlebooks-eng-all-1gram-20120701-0 | ./median_by_topn
median (via topN): 2; lines: 10512769; total time (ms): 998; topN time (ms): 77
===========================

=========================== issue16517.cpp
#include <iostream>
#include <cstdio>
#include <ctime>
#include <vector>
#include <algorithm>

using namespace std;

int main()
{
    clock_t swTotal, swSort;
    swTotal = clock();
    vector<double> values;
    double num;
    while (cin >> num) {
        values.push_back(num);
    }

    size_t medianIndex = values.size() / 2;
    swSort = clock();
    #ifdef TOPN
        const char* method = "nth_element";
nth_element(values.begin(), values.begin() + medianIndex, values.end());
    #else
        const char* method = "sort";
        sort(values.begin(), values.end());
    #endif
    clock_t done = clock();

printf("median (via %s): %g; lines: %zu; total time (ms): %g; %s time (ms): %g\n",
             method, values[medianIndex], values.size(),
             (done - swTotal) * 1000. / CLOCKS_PER_SEC,
             method, (done - swSort) * 1000. / CLOCKS_PER_SEC);
}
===========================

=========================== issue16517.d
import std.stdio;
import std.array : appender;
import std.algorithm : sort, topN;
import std.conv;
import std.range;
import std.datetime: StopWatch;

void main()
{
    StopWatch swTotal, swSort;
    swTotal.start;
    Appender!(double[]) values;
    foreach (line; stdin.byLine)
        values.put(line.to!double);

    size_t medianIndex = values.data.length/2;
    swSort.start;
    version(topn)
    {
        topN(values.data, medianIndex);
        auto method = "topN";
    }
    else {
        sort(values.data);
        auto method = "sort";
    }
    swTotal.stop;
    swSort.stop;

writefln("median (via %s): %g; lines: %d; total time (ms): %d; %s time (ms): %d", method, values.data[medianIndex], values.data.length, swTotal.peek.msecs,
             method, swSort.peek.msecs);
}
===========================

Reply via email to