[algogeeks] Re: BEST SORTING TECHNIQUE

michael Wed, 01 Mar 2006 06:17:17 -0800

Hi Adak,

I agree with some of your points but I do disagree with others.
For clarity, I'll snip your comments inline below.   Note:  when I
refer to quicksort, I am including its variants such as
introsort/ternary
quicksort etc.



> Sorry to say, but the REAL answer is "The Best Sorting Algorithm"
> depends always on WHAT you're sorting, AND on what it's current
> "sortedness" state is.

True, but generally, this is still quicksort which is best for most
things
you might be sorting.  And, I argue that as important as "what you are
sorting"
is "on what you are sorting".  That is, the device which is doing the
sorting.
ie. The cache.

Locality of reference/cache performance is a huge factor in the real
world performance of sorting algorithms.  This is why I almost always
find
radix sort and merge sort to fall far short of quicksort.


>
> For instance, say you're program's task is to add a few names or
> records into a large set of data, like a college directory. That
> directory is itself, already sorted (maybe by an index, but it's the
> same thing). In this situation, clearly no algoriithm that completely
> re-sorts the entire directory, will be fastest. All you need to do is
> code a simple but fast binary or indexed search to find where the new
> record should be inserted, and then adjust the indecies (records), that
> have been affected. No general purpose sort can touch it.

True, but I don't see this as addressing the issue of sorting directly.

>
> If the data has very few keys (like an integer), then Radix sort will
> surely shine, but the data must be held in memory. Reading each
> number's digits from a file on a disk somewhere over a network
> (especially), just is NOT fast.

Radix sort will work fine.  But I still put my money on
quicksort/introsort.
It should not be too hard to find the fastest radix sort (or code a
fast one)
and toss it alone side my test app for sorting unsigned ints in various
tests.
Then we can see for sure.  But I know that Radix sort will suffer
performance
penalties compared to quicksort because of its cache oblivious nature.

>
> For working with huge files of data, mergesort is the best choice.
> Saying you're going to beat it is nice - but it's like trying to
> outswim a tuna - not likely in practice.

I don't believe this.  Generally, when the data is partitioned small
enough
to fit into memory, it would be wise to then switch to quicksort if
possible.
So, i would agree with you that no one sort is best in this case, but
it
probably isn't correct to say merge sort is the best choice either.

>
> For general purpose sorting, where the stability of the sort isn't
> needed (and by stability I mean the order of the original data is
> preserved), Quicksort is best. But to sort less than 1,000 items today,
> I typically use Shell sort - just because it's smaller, and simpler,
> and the difference with today's computers in speed for sorting 1,000
> items, is almost unnoticed.
>

Why?  We have the quicksort already written in the STL.  That's even
cleaner,
faster, and easier to use than adding an additional sort.  Isn't it?
Well, assuming
you have the use of the STL.

> Both Radix and Quicksort can be fine-tuned until your eyes glaze over,
> but that's not what I want. I want to spend my time on the program, or
> just on life - not fiddling around to get the last 1% (0.01) out of the
> performance. Still, there are times... and I prefer to fine-tune
> Quicksort, in that case. Long keys or if the data grows a lot bigger
> than expected, it's no problem for Quicksort.

Some people live for fiddling around to get that last 1% performance.
Some people live for windsurfing.  (^:


>
> For performance, ALWAYS TEST, ALWAYS TEST, ALWAYS TEST, and lastly,
> always test. All the pontificating or mathematically correct abstracts
> for performance, NEVER EQUAL A SINGLE TEST.  It's like the jockey's got
> together and "decided" what horse would win today's race, because
> mathematically..., instead of running the actual race!

100% correct.  The chalkboard is not the real world.
So let's get radix sort of the chalkboard too and put it up against
quicksort in my
test app.  What implementation radix sort (for integers) would you
suggest I use.

>
> The ONE technique which can beat ALL the sorting algorithm, and may
> give you the sorted data you need, is the distribution count or "bucket
> sort". As I mentioned above, NOTHING can touch it, but it's not always
> an option - since nothing is actually sorted.

Huh?  If it doesn't have to actually sort, then I claim my potato
peeler to be the
fastest sorter of them all!  (^:  Seriously, I don't quite follow this
part.  Do you mean
for cases where we don't actually need the sorted data, just their
sorted order?

- Michael

> 
> Cleared this right up, didn't I? <grin>
> 
> Adak


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Algorithm Geeks" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/algogeeks
-~----------~----~----~----~------~----~------~--~---

[algogeeks] Re: BEST SORTING TECHNIQUE

Reply via email to