On Fri, Aug 19, 2016 at 2:22 PM Chas. Owens <chas.ow...@gmail.com> wrote:

> Truth.  If you are checking in lots of things exist a hashset might be a
> better way to go:
>
> my %hashset = map { ($_ => undef) } (3,1,4,2,9,0);
>
> my $found = exists $hashset{4} || 0;
> my $not_found = exists $hashset{10} || 0;
>
> By setting the value of the hash to be undef, you take up less space than
> setting it any other value
>

Here is the result of a benchmark.

cached_hashset: 1
hashset: 1
any: 1
grep: 1

100 items

                     Rate      hashset          grep          any
cached_hashset
hashset           13953/s           --          -93%         -95%
 -100%
grep             212031/s        1420%            --         -25%
-99%
any              283002/s        1928%           33%           --
-99%
cached_hashset 33211602/s      237921%        15564%       11635%
  --

1000 items

                     Rate      hashset         grep           any
cached_hashset
hashset            1496/s           --         -93%          -95%
 -100%
grep              21900/s        1364%           --          -26%
 -100%
any               29658/s        1882%          35%            --
 -100%
cached_hashset 27036512/s     1807036%      123354%        91061%
  --

10000 items

                     Rate      hashset          grep          any
cached_hashset
hashset             108/s           --          -95%         -96%
 -100%
grep               2197/s        1941%            --         -21%
 -100%
any                2796/s        2497%           27%           --
 -100%
cached_hashset 36489558/s    33894645%      1660653%     1304955%
  --

>From this, you can see that any is the best choice if you are going to
search the list once, but a hashset is the best choice (by an insane
margin) if you are going to search the list many times.  Here is the code
that generated the benchmark:

#!/usr/bin/perl

use strict;
use warnings;

use Benchmark;

use List::Util qw/any/;

my @a = (1);
my %cache = (1 => undef);

my %subs = (
grep => sub {
return scalar grep { $_ == 1 } @a;
},
any => sub {
return any { $_ == 1 } @a;
},
hashset => sub {
my %hashset = map { ($_ => undef) } @a;
return exists $hashset{1};
},
cached_hashset => sub {
return exists $cache{1};
}
);

for my $sub (keys %subs) {
    print "$sub: ", $subs{$sub}(), "\n";
}

for my $n (100, 1_000, 10_000) {
@a = reverse 1 .. $n;

print "\n$n items\n\n";
Benchmark::cmpthese -2, \%subs;
}

Reply via email to