On Fri, Aug 19, 2016 at 2:22 PM Chas. Owens <chas.ow...@gmail.com> wrote:
> Truth. If you are checking in lots of things exist a hashset might be a > better way to go: > > my %hashset = map { ($_ => undef) } (3,1,4,2,9,0); > > my $found = exists $hashset{4} || 0; > my $not_found = exists $hashset{10} || 0; > > By setting the value of the hash to be undef, you take up less space than > setting it any other value > Here is the result of a benchmark. cached_hashset: 1 hashset: 1 any: 1 grep: 1 100 items Rate hashset grep any cached_hashset hashset 13953/s -- -93% -95% -100% grep 212031/s 1420% -- -25% -99% any 283002/s 1928% 33% -- -99% cached_hashset 33211602/s 237921% 15564% 11635% -- 1000 items Rate hashset grep any cached_hashset hashset 1496/s -- -93% -95% -100% grep 21900/s 1364% -- -26% -100% any 29658/s 1882% 35% -- -100% cached_hashset 27036512/s 1807036% 123354% 91061% -- 10000 items Rate hashset grep any cached_hashset hashset 108/s -- -95% -96% -100% grep 2197/s 1941% -- -21% -100% any 2796/s 2497% 27% -- -100% cached_hashset 36489558/s 33894645% 1660653% 1304955% -- >From this, you can see that any is the best choice if you are going to search the list once, but a hashset is the best choice (by an insane margin) if you are going to search the list many times. Here is the code that generated the benchmark: #!/usr/bin/perl use strict; use warnings; use Benchmark; use List::Util qw/any/; my @a = (1); my %cache = (1 => undef); my %subs = ( grep => sub { return scalar grep { $_ == 1 } @a; }, any => sub { return any { $_ == 1 } @a; }, hashset => sub { my %hashset = map { ($_ => undef) } @a; return exists $hashset{1}; }, cached_hashset => sub { return exists $cache{1}; } ); for my $sub (keys %subs) { print "$sub: ", $subs{$sub}(), "\n"; } for my $n (100, 1_000, 10_000) { @a = reverse 1 .. $n; print "\n$n items\n\n"; Benchmark::cmpthese -2, \%subs; }