Re: sorter script with MANY unique records

Robert Brown Tue, 09 Dec 2003 10:04:16 -0800

John W. Krahn writes:
 > Bryan Harris wrote:
 > > 
 > > >> Sometimes perl isn't quite the right tool for the job...
 > > >>
 > > >> % man sort
 > > >> % man uniq
 > > >
 > > > If you code it correctly (unlike the program at the URL above) then a
 > > > perl version will be more efficient and faster than using sort and uniq.
 > > 
 > > Please explain...
 > > 
 > > That's the last conclusion I thought anyone would be able to reach.
 > 
 > How about a little demo.  The times posted are the fastest from ten runs
 > of the same programs.
 > 
 > $ perl -le'print int(rand(10_000)+50_000) for 1 .. 1_000_000' >
 > random.txt
 > $ time sort random.txt | uniq > sorted.shell
 > 
 > real    0m38.799s
 > user    0m34.880s
 > sys     0m2.920s
 > $ time sort -u random.txt > sorted.shell
 > 
 > real    0m23.452s
 > user    0m22.520s
 > sys     0m0.720s
 > $ time perl -lne'$h{$_}=()}{print for sort keys%h' random.txt >
 > sorted.perl
 > 
 > real    0m18.450s
 > user    0m17.880s
 > sys     0m0.450s
 > $ diff -s sorted.shell sorted.perl
 > Files sorted.shell and sorted.perl are identical
 > 
 > 
 > The "sort | uniq" version has to run two processes and pass the whole
 > file through the pipe from one process to the next.  The "sort -u"
 > version has to sort the whole file first and then outputs only the
 > unique values.  The perl version uses a hash to store the unique values
 > first and then outputs the sorted values.  Depending on the number of
 > duplicate values, the perl version will usually be faster as it has to
 > sort a smaller list.


But how do they compare when the heash is too big to fit in main
memory?  If the has starts swapping, you loose!  I do not know,
however, whether using a database based hash would be faster or slower
than the sort -u approach.  It would make for an interesting test.
Try sorting a file with over 3E8 unique integers, or maybe just a file
with 256 byte records, and enough unique records to not fit in memory.

 > John
 > -- 
 > use Perl;
 > program
 > fulfillment

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: sorter script with *MANY* unique records

Reply via email to

Re: sorter script with MANY unique records