John W. Krahn writes: > Bryan Harris wrote: > > > > >> Sometimes perl isn't quite the right tool for the job... > > >> > > >> % man sort > > >> % man uniq > > > > > > If you code it correctly (unlike the program at the URL above) then a > > > perl version will be more efficient and faster than using sort and uniq. > > > > Please explain... > > > > That's the last conclusion I thought anyone would be able to reach. > > How about a little demo. The times posted are the fastest from ten runs > of the same programs. > > $ perl -le'print int(rand(10_000)+50_000) for 1 .. 1_000_000' > > random.txt > $ time sort random.txt | uniq > sorted.shell > > real 0m38.799s > user 0m34.880s > sys 0m2.920s > $ time sort -u random.txt > sorted.shell > > real 0m23.452s > user 0m22.520s > sys 0m0.720s > $ time perl -lne'$h{$_}=()}{print for sort keys%h' random.txt > > sorted.perl > > real 0m18.450s > user 0m17.880s > sys 0m0.450s > $ diff -s sorted.shell sorted.perl > Files sorted.shell and sorted.perl are identical > > > The "sort | uniq" version has to run two processes and pass the whole > file through the pipe from one process to the next. The "sort -u" > version has to sort the whole file first and then outputs only the > unique values. The perl version uses a hash to store the unique values > first and then outputs the sorted values. Depending on the number of > duplicate values, the perl version will usually be faster as it has to > sort a smaller list.
But how do they compare when the heash is too big to fit in main memory? If the has starts swapping, you loose! I do not know, however, whether using a database based hash would be faster or slower than the sort -u approach. It would make for an interesting test. Try sorting a file with over 3E8 unique integers, or maybe just a file with 256 byte records, and enough unique records to not fit in memory. > John > -- > use Perl; > program > fulfillment -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>