This is a patch for bug 20837 - "libc++ std::sort has O(n^2) worst
case, standard mandates O(N log(N))" found at
http://llvm.org/bugs/show_bug.cgi?id=20837 . Description and
attachments copied in this email as suggested by Hal Finkel.


The C++ standard mandates that `std::sort` has complexity O(N log(N)),
but libc++'s implementation takes O(N^2) in the worst case.

As proof I've attached a program that constructs a worst case input
for several sizes. It also instruments the number of comparisons used
to sort these worst cases and prints the results. The technique used
is described in the 1999 paper "A Killer Adversary for Quicksort" by
M. D. McIlroy.

Compiling and running:

$ clang++ -O2 -m64 -march=native -std=c++11 -stdlib=libc++ main.cpp
-nodefaultlibs -lc++ -lc++abi -lm -lc -lgcc_s -lgcc && ./a.out
N: comparisons
100: 2479
200: 10229
400: 40729
800: 161729
1600: 448698
3200: 1413898
6400: 5264298

This is clearly quadratic. To illustrate this defect more, these are
the comparison counts given by compiling using libstdc++:

$ clang++ -O2 -m64 -march=native -std=c++11 main.cpp && ./a.out
N: comparisons
100: 1742
200: 4260
400: 10035
800: 22784
1600: 51049
3200: 112386
6400: 244850

Inspecting the source of <algorithm> shows the cause of the issue:
there is no introsort implemented, only quicksort (albeit with
non-trivial optimizations, but nothing preventing the worst case).
I've written a patch that augments the current implementation to make
it work as an introspective sort, switching to heapsort if the
recursion depth exceeds 2*floor(log(N)). The patch is attached to this
email.

Having not contributed to libc++ before I'm not 100% familiar with all
coding styles/(un)written rules. My changes are rather minimal though,
so I've just followed what style could be found in context. And of
course I knowingly submit my code under the libc++ licenses, the MIT
license and the UIUC License.

Attachment: stdsort_introspective.patch
Description: Binary data

#include <algorithm>
#include <iostream>
#include <vector>

void make_killer(int size, std::vector<int>& v) {
    int candidate = 0;
    int num_solid = 0;
    int gas = size - 1;

    std::vector<int> tmp(size);
    v.resize(size);

    for (int i = 0; i < size; ++i) {
        tmp[i] = i;
        v[i] = gas;
    }

    std::sort(tmp.begin(), tmp.end(), [&](int x, int y) {
        if (v[x] == gas && v[y] == gas) {
            if (x == candidate) v[x] = num_solid++;
            else v[y] = num_solid++;
        }

        if (v[x] == gas) candidate = x;
        else if (v[y] == gas) candidate = y;

        return v[x] < v[y];
    });
}

int main(int argc, char** argv) {
    std::vector<int> v;
    int comparison_count;

    auto counter = [&](int x, int y) { ++comparison_count; return x < y; };

    std::cout << "N: comparisons\n";
    for (int i = 100; i <= 6400; i *= 2) {
        // to nullify small constants we multiply by 100
        make_killer(i, v);

        comparison_count = 0;
        std::sort(v.begin(), v.end(), counter);
        std::cout << i << ": " << comparison_count << "\n";
    }

    return 0;
}
_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Reply via email to