This is a patch for bug 20837 - "libc++ std::sort has O(n^2) worst case, standard mandates O(N log(N))" found at http://llvm.org/bugs/show_bug.cgi?id=20837 . Description and attachments copied in this email as suggested by Hal Finkel.
The C++ standard mandates that `std::sort` has complexity O(N log(N)), but libc++'s implementation takes O(N^2) in the worst case. As proof I've attached a program that constructs a worst case input for several sizes. It also instruments the number of comparisons used to sort these worst cases and prints the results. The technique used is described in the 1999 paper "A Killer Adversary for Quicksort" by M. D. McIlroy. Compiling and running: $ clang++ -O2 -m64 -march=native -std=c++11 -stdlib=libc++ main.cpp -nodefaultlibs -lc++ -lc++abi -lm -lc -lgcc_s -lgcc && ./a.out N: comparisons 100: 2479 200: 10229 400: 40729 800: 161729 1600: 448698 3200: 1413898 6400: 5264298 This is clearly quadratic. To illustrate this defect more, these are the comparison counts given by compiling using libstdc++: $ clang++ -O2 -m64 -march=native -std=c++11 main.cpp && ./a.out N: comparisons 100: 1742 200: 4260 400: 10035 800: 22784 1600: 51049 3200: 112386 6400: 244850 Inspecting the source of <algorithm> shows the cause of the issue: there is no introsort implemented, only quicksort (albeit with non-trivial optimizations, but nothing preventing the worst case). I've written a patch that augments the current implementation to make it work as an introspective sort, switching to heapsort if the recursion depth exceeds 2*floor(log(N)). The patch is attached to this email. Having not contributed to libc++ before I'm not 100% familiar with all coding styles/(un)written rules. My changes are rather minimal though, so I've just followed what style could be found in context. And of course I knowingly submit my code under the libc++ licenses, the MIT license and the UIUC License.
stdsort_introspective.patch
Description: Binary data
#include <algorithm>
#include <iostream>
#include <vector>
void make_killer(int size, std::vector<int>& v) {
int candidate = 0;
int num_solid = 0;
int gas = size - 1;
std::vector<int> tmp(size);
v.resize(size);
for (int i = 0; i < size; ++i) {
tmp[i] = i;
v[i] = gas;
}
std::sort(tmp.begin(), tmp.end(), [&](int x, int y) {
if (v[x] == gas && v[y] == gas) {
if (x == candidate) v[x] = num_solid++;
else v[y] = num_solid++;
}
if (v[x] == gas) candidate = x;
else if (v[y] == gas) candidate = y;
return v[x] < v[y];
});
}
int main(int argc, char** argv) {
std::vector<int> v;
int comparison_count;
auto counter = [&](int x, int y) { ++comparison_count; return x < y; };
std::cout << "N: comparisons\n";
for (int i = 100; i <= 6400; i *= 2) {
// to nullify small constants we multiply by 100
make_killer(i, v);
comparison_count = 0;
std::sort(v.begin(), v.end(), counter);
std::cout << i << ": " << comparison_count << "\n";
}
return 0;
}
_______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
