Re: set subtraction

Craig S. Cottingham Sat, 14 Jul 2001 06:19:24 -0700
On Friday, July 13, 2001, at 12:46 , Daniel S. Wilkerson wrote:

> Well, lets not be too theoretical.  For short lists,
> Jorg's way is probably faster.  Hash accesses look
> nice and neat, but lots is going on for each one.

Why speculate, when you can instrument...ate?

craigc@samantha:~/fwp$ ./setsub
Benchmark: running test1_joerg_big, test1_joerg_shuffled, 
test1_joerg_small, test2_japhy_big, test2_japhy_shuffled, 
test2_japhy_small, test3_robin_big, test3_robin_shuffled, 
test3_robin_small, test4_adam_big, test4_adam_shuffled, 
test4_adam_small, test5_bits_big, test5_bits_shuffled, 
test5_bits_small, each for at least 3 CPU seconds...
test1_joerg_big: 86 wallclock secs (84.00 usr +  0.00 sys = 
84.00 CPU) @  0.01/s (n=1)
             (warning: too few iterations for a reliable count)
test1_joerg_shuffled: 86 wallclock secs (84.31 usr +  0.00 sys = 
84.31 CPU) @  0.01/s (n=1)
             (warning: too few iterations for a reliable count)
test1_joerg_small:  3 wallclock secs ( 3.01 usr +  0.00 sys =  
3.01 CPU) @ 5515.61/s (n=16602)
test2_japhy_big:  3 wallclock secs ( 3.01 usr +  0.00 sys =  
3.01 CPU) @ 21.93/s (n=66)
test2_japhy_shuffled:  4 wallclock secs ( 3.15 usr +  0.00 
sys =  3.15 CPU) @ 21.27/s (n=67)
test2_japhy_small:  3 wallclock secs ( 3.11 usr +  0.00 sys =  
3.11 CPU) @ 3778.46/s (n=11751)
test3_robin_big:  3 wallclock secs ( 3.08 usr +  0.00 sys =  
3.08 CPU) @ 17.21/s (n=53)
test3_robin_shuffled:  3 wallclock secs ( 3.08 usr +  0.00 
sys =  3.08 CPU) @ 15.26/s (n=47)
test3_robin_small:  3 wallclock secs ( 3.15 usr +  0.00 sys =  
3.15 CPU) @ 2096.83/s (n=6605)
test4_adam_big:  3 wallclock secs ( 3.03 usr +  0.00 sys =  3.03 
CPU) @  3.96/s (n=12)
test4_adam_shuffled:  3 wallclock secs ( 3.14 usr +  0.00 sys =  
3.14 CPU) @  3.82/s (n=12)
test4_adam_small:  4 wallclock secs ( 3.16 usr +  0.00 sys =  
3.16 CPU) @ 4056.65/s (n=12819)
test5_bits_big:  3 wallclock secs ( 3.16 usr +  0.00 sys =  3.16 
CPU) @ 10.76/s (n=34)
test5_bits_shuffled:  3 wallclock secs ( 3.10 usr +  0.00 sys =  
3.10 CPU) @ 10.65/s (n=33)
test5_bits_small:  2 wallclock secs ( 3.00 usr +  0.00 sys =  
3.00 CPU) @ 7415.67/s (n=22247)
craigc@samantha:~/fwp$

Conclusions:
1. J�rg's method of using nested greps works really well for 
small sets, but degrades poorly.
2. Hashes degrade much more gracefully.
3. Adam's method of manually iterating through both sets 
degrades linearly, suggesting that it is bound by the 
performance of the Perl interpreter, not the data structures.
4. Bitfields, at least as I have implemented them, fall 
somewhere between nested greps and hashes. I suspect the 
explicit loops required to populate, manipulate, and extract the 
sets from the bitfield.

Full code below .sig.

--
Craig S. Cottingham
[EMAIL PROTECTED]
PGP key available from: 
<http://pgp.ai.mit.edu:11371/pks/lookup?op=get&search=0xA2FFBE41>
ID=0xA2FFBE41, fingerprint=6AA8 2E28 2404 8A95 B8FC 7EFC 136F 
0CEF A2FF BE41


===== snip here =====

#!/usr/bin/perl -w

use Benchmark qw(cmpthese timethese);

my @small_ints = (1..10);
my @small_evens = map { $_ * 2 } (1..(@small_ints / 2));
my @small_odds = map { ($_ * 2) - 1 } (1..(@small_ints / 2));
my @big_ints = (1..10_000);
my @big_evens = map { $_ * 2 } (1..(@big_ints / 2));
my @big_odds = map { ($_ * 2) - 1 } (1..(@big_ints / 2));

my @shuffled_ints;
{
     my @temp = @big_ints;
     while (@temp)
     {
         my $index = int(rand(@temp));
         my $val = splice @temp, $index, 1;
         push @shuffled_ints, $val;
     }
}

sub test1_joerg
{
     my ($ints, $evens) = @_;

     grep { my $f = $_; !scalar grep { $_ eq $f } @$evens } @$ints;
}

sub test2_japhy
{
     my ($ints, $evens) = @_;

     my %in_evens;
     @in_evens{@$evens} = ();
     grep !exists($in_evens{$_}), @$ints;
}

sub test3_robin
{
     my ($ints, $evens) = @_;

     my %ints;
     @ints{@$ints} = ();
     delete @ints{@$evens};
     keys %ints;
}

sub test4_adam
{
     # @ints and @evens must both be sorted
     # can't assume that they are, so do it ourselves

     my (@ints, @evens);
     @ints = sort { $a <=> $b } @{$_[0]};
     @evens = sort { $a <=> $b } @{$_[1]};

     my ($idx1, $idx2, @odds);
     $idx1 = $idx2 = 0;
     while (($idx1 < @ints) && ($idx2 < @evens))
     {
         my $c = $ints[$idx1] <=> $evens[$idx2];
         if ($c < 0)
         {
             push(@odds, $ints[$idx1]);
             ++$idx1;
         }
         elsif ($c > 0)
         {
             ++$idx2;
         }
         else
         {
             ++$idx1;
             ++$idx2;
         }
     }

     if ($idx1 < @ints)
     {
         push(@odds, @ints[$idx1 .. $#ints]);
     }

     @odds;
}

sub test5_bits
{
     # hinted at by Adam Rice

     my ($ints, $evens) = @_;

     my $bits = 0;
     vec($bits, $_, 1) = 1 foreach @$ints;
     vec($bits, $_, 1) = 0 foreach @$evens;

     my @odds = grep { vec($bits, $_, 1) } @$ints;
     @odds;
}

timethese(0,
     {
         test1_joerg_big         =>  sub 
{ test1_joerg(\@big_ints, \@big_evens) },
         test1_joerg_shuffled    =>  sub 
{ test1_joerg(\@shuffled_ints, \@big_evens) },
         test1_joerg_small       =>  sub 
{ test1_joerg(\@small_ints, \@small_evens) },
         test2_japhy_big         =>  sub 
{ test2_japhy(\@big_ints, \@big_evens) },
         test2_japhy_shuffled    =>  sub 
{ test2_japhy(\@shuffled_ints, \@big_evens) },
         test2_japhy_small       =>  sub 
{ test2_japhy(\@small_ints, \@small_evens) },
         test3_robin_big         =>  sub 
{ test3_robin(\@big_ints, \@big_evens) },
         test3_robin_shuffled    =>  sub 
{ test3_robin(\@shuffled_ints, \@big_evens) },
         test3_robin_small       =>  sub 
{ test3_robin(\@small_ints, \@small_evens) },
         test4_adam_big          =>  sub { test4_adam(\@big_ints, 
\@big_evens) },
         test4_adam_shuffled     =>  sub 
{ test4_adam(\@shuffled_ints, \@big_evens) },
         test4_adam_small        =>  sub 
{ test4_adam(\@small_ints, \@small_evens) },
         test5_bits_big          =>  sub { test5_bits(\@big_ints, 
\@big_evens) },
         test5_bits_shuffled     =>  sub 
{ test5_bits(\@shuffled_ints, \@big_evens) },
         test5_bits_small        =>  sub 
{ test5_bits(\@small_ints, \@small_evens) },
     }
);
Re: set subtraction

Reply via email to