On Friday, July 13, 2001, at 12:46 , Daniel S. Wilkerson wrote:
> Well, lets not be too theoretical. For short lists,
> Jorg's way is probably faster. Hash accesses look
> nice and neat, but lots is going on for each one.
Why speculate, when you can instrument...ate?
craigc@samantha:~/fwp$ ./setsub
Benchmark: running test1_joerg_big, test1_joerg_shuffled,
test1_joerg_small, test2_japhy_big, test2_japhy_shuffled,
test2_japhy_small, test3_robin_big, test3_robin_shuffled,
test3_robin_small, test4_adam_big, test4_adam_shuffled,
test4_adam_small, test5_bits_big, test5_bits_shuffled,
test5_bits_small, each for at least 3 CPU seconds...
test1_joerg_big: 86 wallclock secs (84.00 usr + 0.00 sys =
84.00 CPU) @ 0.01/s (n=1)
(warning: too few iterations for a reliable count)
test1_joerg_shuffled: 86 wallclock secs (84.31 usr + 0.00 sys =
84.31 CPU) @ 0.01/s (n=1)
(warning: too few iterations for a reliable count)
test1_joerg_small: 3 wallclock secs ( 3.01 usr + 0.00 sys =
3.01 CPU) @ 5515.61/s (n=16602)
test2_japhy_big: 3 wallclock secs ( 3.01 usr + 0.00 sys =
3.01 CPU) @ 21.93/s (n=66)
test2_japhy_shuffled: 4 wallclock secs ( 3.15 usr + 0.00
sys = 3.15 CPU) @ 21.27/s (n=67)
test2_japhy_small: 3 wallclock secs ( 3.11 usr + 0.00 sys =
3.11 CPU) @ 3778.46/s (n=11751)
test3_robin_big: 3 wallclock secs ( 3.08 usr + 0.00 sys =
3.08 CPU) @ 17.21/s (n=53)
test3_robin_shuffled: 3 wallclock secs ( 3.08 usr + 0.00
sys = 3.08 CPU) @ 15.26/s (n=47)
test3_robin_small: 3 wallclock secs ( 3.15 usr + 0.00 sys =
3.15 CPU) @ 2096.83/s (n=6605)
test4_adam_big: 3 wallclock secs ( 3.03 usr + 0.00 sys = 3.03
CPU) @ 3.96/s (n=12)
test4_adam_shuffled: 3 wallclock secs ( 3.14 usr + 0.00 sys =
3.14 CPU) @ 3.82/s (n=12)
test4_adam_small: 4 wallclock secs ( 3.16 usr + 0.00 sys =
3.16 CPU) @ 4056.65/s (n=12819)
test5_bits_big: 3 wallclock secs ( 3.16 usr + 0.00 sys = 3.16
CPU) @ 10.76/s (n=34)
test5_bits_shuffled: 3 wallclock secs ( 3.10 usr + 0.00 sys =
3.10 CPU) @ 10.65/s (n=33)
test5_bits_small: 2 wallclock secs ( 3.00 usr + 0.00 sys =
3.00 CPU) @ 7415.67/s (n=22247)
craigc@samantha:~/fwp$
Conclusions:
1. J�rg's method of using nested greps works really well for
small sets, but degrades poorly.
2. Hashes degrade much more gracefully.
3. Adam's method of manually iterating through both sets
degrades linearly, suggesting that it is bound by the
performance of the Perl interpreter, not the data structures.
4. Bitfields, at least as I have implemented them, fall
somewhere between nested greps and hashes. I suspect the
explicit loops required to populate, manipulate, and extract the
sets from the bitfield.
Full code below .sig.
--
Craig S. Cottingham
[EMAIL PROTECTED]
PGP key available from:
<http://pgp.ai.mit.edu:11371/pks/lookup?op=get&search=0xA2FFBE41>
ID=0xA2FFBE41, fingerprint=6AA8 2E28 2404 8A95 B8FC 7EFC 136F
0CEF A2FF BE41
===== snip here =====
#!/usr/bin/perl -w
use Benchmark qw(cmpthese timethese);
my @small_ints = (1..10);
my @small_evens = map { $_ * 2 } (1..(@small_ints / 2));
my @small_odds = map { ($_ * 2) - 1 } (1..(@small_ints / 2));
my @big_ints = (1..10_000);
my @big_evens = map { $_ * 2 } (1..(@big_ints / 2));
my @big_odds = map { ($_ * 2) - 1 } (1..(@big_ints / 2));
my @shuffled_ints;
{
my @temp = @big_ints;
while (@temp)
{
my $index = int(rand(@temp));
my $val = splice @temp, $index, 1;
push @shuffled_ints, $val;
}
}
sub test1_joerg
{
my ($ints, $evens) = @_;
grep { my $f = $_; !scalar grep { $_ eq $f } @$evens } @$ints;
}
sub test2_japhy
{
my ($ints, $evens) = @_;
my %in_evens;
@in_evens{@$evens} = ();
grep !exists($in_evens{$_}), @$ints;
}
sub test3_robin
{
my ($ints, $evens) = @_;
my %ints;
@ints{@$ints} = ();
delete @ints{@$evens};
keys %ints;
}
sub test4_adam
{
# @ints and @evens must both be sorted
# can't assume that they are, so do it ourselves
my (@ints, @evens);
@ints = sort { $a <=> $b } @{$_[0]};
@evens = sort { $a <=> $b } @{$_[1]};
my ($idx1, $idx2, @odds);
$idx1 = $idx2 = 0;
while (($idx1 < @ints) && ($idx2 < @evens))
{
my $c = $ints[$idx1] <=> $evens[$idx2];
if ($c < 0)
{
push(@odds, $ints[$idx1]);
++$idx1;
}
elsif ($c > 0)
{
++$idx2;
}
else
{
++$idx1;
++$idx2;
}
}
if ($idx1 < @ints)
{
push(@odds, @ints[$idx1 .. $#ints]);
}
@odds;
}
sub test5_bits
{
# hinted at by Adam Rice
my ($ints, $evens) = @_;
my $bits = 0;
vec($bits, $_, 1) = 1 foreach @$ints;
vec($bits, $_, 1) = 0 foreach @$evens;
my @odds = grep { vec($bits, $_, 1) } @$ints;
@odds;
}
timethese(0,
{
test1_joerg_big => sub
{ test1_joerg(\@big_ints, \@big_evens) },
test1_joerg_shuffled => sub
{ test1_joerg(\@shuffled_ints, \@big_evens) },
test1_joerg_small => sub
{ test1_joerg(\@small_ints, \@small_evens) },
test2_japhy_big => sub
{ test2_japhy(\@big_ints, \@big_evens) },
test2_japhy_shuffled => sub
{ test2_japhy(\@shuffled_ints, \@big_evens) },
test2_japhy_small => sub
{ test2_japhy(\@small_ints, \@small_evens) },
test3_robin_big => sub
{ test3_robin(\@big_ints, \@big_evens) },
test3_robin_shuffled => sub
{ test3_robin(\@shuffled_ints, \@big_evens) },
test3_robin_small => sub
{ test3_robin(\@small_ints, \@small_evens) },
test4_adam_big => sub { test4_adam(\@big_ints,
\@big_evens) },
test4_adam_shuffled => sub
{ test4_adam(\@shuffled_ints, \@big_evens) },
test4_adam_small => sub
{ test4_adam(\@small_ints, \@small_evens) },
test5_bits_big => sub { test5_bits(\@big_ints,
\@big_evens) },
test5_bits_shuffled => sub
{ test5_bits(\@shuffled_ints, \@big_evens) },
test5_bits_small => sub
{ test5_bits(\@small_ints, \@small_evens) },
}
);