>>>>> "MA" == Michael Alipio <daem0n...@yahoo.com> writes:
MA> i'm planning to sort an input file (which was File::Slurp'ed, most MA> likely megabyte-sized file) in various ways. I did some readings MA> and learned several methods that people have come up with in MA> recent years. So to summarize, the default sort is fast (uses MA> quick sort), explicit (using sub) is a bit slower, other method MA> that uses caching is faster. Then there's Schwartzian Transform MA> and a packed version by Guttman. Seems like everything is MA> clear. Guttman is the fastest, until I went to cpan. Found MA> Sort::Key, which claims to be the fastest, even faster that ST, MA> GRT. Now, before someone says, why not try each one and see for MA> yourself (which doing such could be another subject for me to MA> learn), my question is this: if such faster sorting algorithms MA> exist, why don't they just replace the default sort function in MA> Perl? you need to understand how those techniques work and some sort theory as well. perl's internal sort is a generic sort on strings (using the quicksort algorithm). you aren't going to get faster than that for general data (if you have specific known types of data you can sometimes optimize it een more). what all the techniques do is optimize key extraction. instead of extracting the keys for each comparison (this is O(N * log N) in sort theory notation) you do it once for each key which is O(N). so you transfer a large amount of work from N * log N to order N. and the more data you have, the more time you save. packing the keys into strings is another optimization as you don't need the callback for each comparision (sort will use the internal string comparison). so that is why the GRT which does a packed sort is faster than the ST which needs callbacks and array accessing during each comparison. as for Sort::Key vs Sort::Maker's speed, i haven't run a benchmark on them. this isn't hard to do and would be good for you as an exercise. they have very different api's and designs so that is an issue too. also sort::maker can generate all 4 styles of sorts and you can get the generated code for reuse later. that could save some more time as well. MA> And for the classical question, given my situation (in combination MA> with File::Slurp), which is fastest sort method? (I hope somebody MA> includes this in perlfaq in the future). this isn't an FAQ so i doubt it will ever be added. sorting large data sets in perl isn't that common. most large sorts are done in the DB already or by external very fast sort programs. your megabyte size files aren't even that large by today's standards (especially for slurping). uri -- Uri Guttman ------ u...@stemsystems.com -------- http://www.sysarch.com -- ----- Perl Code Review , Architecture, Development, Training, Support ------ --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com --------- -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/