Re: the classic "which is the fastest way to sort" question

Uri Guttman Mon, 09 Nov 2009 09:41:26 -0800

>>>>> "MA" == Michael Alipio <daem0n...@yahoo.com> writes:


  MA> i'm planning to sort an input file (which was File::Slurp'ed, most
  MA> likely megabyte-sized file) in various ways. I did some readings
  MA> and learned several methods that people have come up with in
  MA> recent years. So to summarize, the default sort is fast (uses
  MA> quick sort), explicit (using sub) is a bit slower, other method
  MA> that uses caching is faster. Then there's Schwartzian Transform
  MA> and a packed version by Guttman. Seems like everything is
  MA> clear. Guttman is the fastest, until I went to cpan. Found
  MA> Sort::Key, which claims to be the fastest, even faster that ST,
  MA> GRT. Now, before someone says, why not try each one and see for
  MA> yourself (which doing such could be another subject for me to
  MA> learn), my question is this: if such faster sorting algorithms
  MA> exist, why don't they just replace the default sort function in
  MA> Perl?

you need to understand how those techniques work and some sort theory as
well. perl's internal sort is a generic sort on strings (using the
quicksort algorithm). you aren't going to get faster than that for
general data (if you have specific known types of data you can sometimes
optimize it een more). what all the techniques do is optimize key
extraction. instead of extracting the keys for each comparison (this is
O(N * log N) in sort theory notation) you do it once for each key which
is O(N). so you transfer a large amount of work from N * log N to order
N. and the more data you have, the more time you save. packing the keys
into strings is another optimization as you don't need the callback for
each comparision (sort will use the internal string comparison). so that
is why the GRT which does a packed sort is faster than the ST which
needs callbacks and array accessing during each comparison.

as for Sort::Key vs Sort::Maker's speed, i haven't run a benchmark on
them. this isn't hard to do and would be good for you as an
exercise. they have very different api's and designs so that is an issue
too. also sort::maker can generate all 4 styles of sorts and you can get
the generated code for reuse later. that could save some more time as
well.

  MA> And for the classical question, given my situation (in combination
  MA> with File::Slurp), which is fastest sort method? (I hope somebody
  MA> includes this in perlfaq in the future).

this isn't an FAQ so i doubt it will ever be added. sorting large data
sets in perl isn't that common. most large sorts are done in the DB
already or by external very fast sort programs. your megabyte size files
aren't even that large by today's standards (especially for slurping).

uri

-- 
Uri Guttman  ------  u...@stemsystems.com  --------  http://www.sysarch.com --
-----  Perl Code Review , Architecture, Development, Training, Support ------
---------  Gourmet Hot Cocoa Mix  ----  http://bestfriendscocoa.com ---------

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: the classic "which is the fastest way to sort" question

Reply via email to