Re: efficiently keeping a short list

Jay Savage Tue, 11 Jul 2006 07:07:01 -0700

On 7/10/06, Mr. Shawn H. Corey <[EMAIL PROTECTED]> wrote:

Jay Savage wrote:
>   foreach ('a'..'z') {
>      $recent{$_} = time;
>      sleep 1;
>   }


Ouch. The OP did mention his limit was 200. So he must have more than
200 elements to scan. This algorithm will takes at least 3m20s, so it's
hardly fast (which was one of the points of this exercise).


Not sure what kind of hardware you have, but it takes considerably
less than a second for me. Not counting the sleep(), of course, which
was just to guarantee unique timestamps for the example. Assuming the
data being used for the keys (the "elements") is of a reasonable size
(for most values of "a reasonable size") my example should easily
handle 10's of thousands of "elements" (without the sleep. ;) ).

Don't believe me? Try this:

#!/usr/bin/perl -w
use strict;
use Time::HiRes;

my %recent;
my $limit = 200;

sub get_list {
   my @sorted = sort {$recent{$b} <=> $recent{$a}} keys %recent;
   foreach (@sorted[$limit..$#sorted]){
       delete $recent{$_};
   }
   return @sorted[0..$limit-1];
}

# example usage:

my $time1 = Time::HiRes::time;

foreach ( 0..10000 ) {
   $recent{$_} = Time::HiRes::time;
}

print get_list(), "\n";

my $ct =0;

while (my ($k, $v) = each %recent) {
   print "$k: $v\n";
}

my $time2 = Time::HiRes::time;

my $delta = $time2 - $time1;

print "\ntook $delta seconds\n";


For those 10,000 "records" response averages around .5s on my machine,
which is a windows workstation running all sorts of things in the
background, hardly optimal conditions. And that's still not a fair
becnhmark because the print time, by far the most time-consuming part
of the whole operation, is factored into the total. Hashes are fast
and Perl's built-in sort isn't as shabby as most people think.

Try:

my $id = 0;
for ( 'a' .. 'z' ){
  $recent{$_} = $id++;
}

Of course, this assumes you have enough memory for everything. These
days, this is normally true for a personal computer but web servers can
run out of memory because they have so much else to do to.


Yes the problem here is that you're pretty much guaranteed to overflow
$id at some point, given a sufficienly long-running process, even
discounting the size of the hash. That's why I went for time() or
Time::HiRes::time() instead of an autoincrement; no muss, no fuss.
Unless you're still running your script on the same hardware in 2038.


Of course since OP didn't give any implementation details, it's
possible that the list could grow even larger between calls to
get_list(). Maybe the user will only want the list of 200 recent items
after an average of 1,000,000 elements are added. Then, sure, you'll
want to take a different approach. If that's the case, I'd recommend
getting a copy of _Mastering Algorithms With Perl_, and reading up on
linked lists and b-trees.

In my experience, though, it's standard to size the list slightly
larger than the number elements you expect to add between accesses.
For most UI situations, the working definition of "recent items" is
"the most items we think a normal user will access between views of
'recent items.'" The value of a recent items list decreases rapidly
when users use more items in a session than the list can hold, and
demonstrably recent items get shifted off the bottom.

best,

-- jay
--------------------------------------------------
This email and attachment(s): [  ] blogable; [ x ] ask first; [  ]
private and confidential

daggerquill [at] gmail [dot] com
http://www.tuaw.com  http://www.dpguru.com  http://www.engatiki.org

values of β will give rise to dom!

Re: efficiently keeping a short list

Reply via email to