Re: [Boston.pm] count substrings within strings

Tom Metro Wed, 10 Aug 2005 09:32:59 -0700

Stephen A. Jarjoura wrote:
> I needed to count the number of occurances of a substring within a string...
>  
> $buffer =~ s/(\<transfer)/$transfer_count++;$1/sg;


Don't you need a /e modifier for that to work?


> Did I miss some obvious, and easier method?

     my @count = $string =~ m/$substr/g;
     print scalar @count, "\n";

Is a little more Perl-ish than Ron's variation, and although it is 85% 
faster than your approach, it still ranks among the slowest. Using 
index() instead of an RE is the fastest (about 4x faster; see benchmarks 
below), though far less elegant looking, so overall Ron's variation is 
probably the best blend of clarity and performance (about 3x faster).


On an aside, you'd think this:

     my $count = $string =~ m/$substr/g;
or this:
     my $count = ($string =~ m/($substr)/g);

would work, but they don't. In scalar context m//g only returns 
true/false indicating the success of the most recent match. I assume 
that's so m//g works as expected in while loops.


Below are a few variations and the resulting benchmarks.

  -Tom


use strict;
use Benchmark qw(:all);

sub count1 {
   my $string = shift;
   my $substr = shift;
     my @count = $string =~ m/$substr/g;
     return scalar @count;
}

sub count2 {
   my $string = shift;
   my $substr = shift;
     my $count = 0;
     for (my $pos=0; $pos = index($string,$substr,$pos), $pos != -1;$pos 
+= length($substr), $count++) {}
     return $count;
}

sub count3 {
   my $string = shift;
   my $substr = shift;
     my $length = length($substr);
     my $count = 0; my $pos;
     while ( ($pos = index($string,$substr,$pos)) != -1) {
        $pos += $length;
        $count++;
     }
     return $count;
}

# Ron's variation
sub count4 {
   my $string = shift;
   my $substr = shift;

     my $count = 0;
     $count++ while $string =~ m/$substr/g;
     return $count;
}

# original
sub count5 {
   my $string = shift;
   my $substr = shift;

     my $count = 0;
     $string =~ s/($substr)/$count++;$1/egs;
     return $count;
}


my $x = 'abc123abc456abc789abc';
my $s = 'abc';

print "Returned counts: \n",
     'Count1 => ', count1($x,$s), "\n",
     'Count2 => ', count2($x,$s), "\n",
     'Count3 => ', count3($x,$s), "\n",
     'Count4 => ', count4($x,$s), "\n",
     'Count5 => ', count5($x,$s), "\n";

cmpthese(10000, {
     'Count1' => sub {count1($x,$s)},
     'Count2' => sub {count2($x,$s)},
     'Count3' => sub {count3($x,$s)},
     'Count4' => sub {count4($x,$s)},
     'Count5' => sub {count5($x,$s)}
});




% perl string_count.pl
Returned counts:
Count1 => 4
Count2 => 4
Count3 => 4
Count4 => 4
Count5 => 4

           Rate Count5 Count1 Count4 Count2 Count3
Count5  5043/s     --   -46%   -71%   -78%   -80%
Count1  9328/s    85%     --   -46%   -59%   -63%
Count4 17212/s   241%    85%     --   -24%   -31%
Count2 22727/s   351%   144%    32%     --    -9%
Count3 24938/s   395%   167%    45%    10%     --

 
_______________________________________________
Boston-pm mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] count substrings within strings

Reply via email to