http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58437

--- Comment #13 from Chris Jefferson <chris at bubblescope dot net> ---
Created attachment 30861
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30861&action=edit
Sort patch

Wow, this an embarrassing bug to get through testing. Obviously not enough
profiling done!

This patch (I hope) gets performance back to g++-4.4 levels, and in my
experiments slightly (but not noticeably) better. I would be interested in
anyone trying out this patch on their own data.

For the interested, here is a description of the bug. I will assume you
understand quicksort!

Quicksort requires taking a pivot, then partitioning the array by that pivot.
Getting O(n log n) behaviour requires that you choose a "good" pivot. Like most
people, in g++ we choose the median of the first, middle, and last value in the
array as the pivot.

In the C++03 world, after we chose the pivot we took a copy of it. In C++11, we
cannot copy values in std::sort any more, only move and swap. 

Trying to keep track of the pivot as it moves around during the partitioning is
horrible and inefficient. So, I had the "bright idea" to do swap(*first,
*pivot). Now the pivot is at the first location in the array, and we can just
partition [first+1, end) and the partitioning is fine. See the mistake yet?

The bug comes when we recursively partition the first cell. We again choose the
median from the first, mid and last of this cell, but *first is always the
biggest thing in the cell (as we pivoted on it one level above), meaning our
median calculation is unbalanced and often chooses terrible values, leading to
us often hitting the O(n^2) behaviour of quicksort (actually we technically get
o(n log n), as we switch to heapsort when things are going badly, but the time
is still bad).

There are a selection of ways of fixing this. This patch switches the median
calculation from considering then first, mid, last locations to on first+1,
mid, last-1 (there is no reason in this bug to consider last-1 instead of last,
but I thought it wouldn't do any harm, and might avoid other issues of
accidentally choosing a bad pivot.

Reply via email to