Stephen A. Jarjoura wrote:
> I needed to count the number of occurances of a substring within a string...
>
> $buffer =~ s/(\<transfer)/$transfer_count++;$1/sg;
Don't you need a /e modifier for that to work?
> Did I miss some obvious, and easier method?
my @count = $string =~ m/$substr/g;
print scalar @count, "\n";
Is a little more Perl-ish than Ron's variation, and although it is 85%
faster than your approach, it still ranks among the slowest. Using
index() instead of an RE is the fastest (about 4x faster; see benchmarks
below), though far less elegant looking, so overall Ron's variation is
probably the best blend of clarity and performance (about 3x faster).
On an aside, you'd think this:
my $count = $string =~ m/$substr/g;
or this:
my $count = ($string =~ m/($substr)/g);
would work, but they don't. In scalar context m//g only returns
true/false indicating the success of the most recent match. I assume
that's so m//g works as expected in while loops.
Below are a few variations and the resulting benchmarks.
-Tom
use strict;
use Benchmark qw(:all);
sub count1 {
my $string = shift;
my $substr = shift;
my @count = $string =~ m/$substr/g;
return scalar @count;
}
sub count2 {
my $string = shift;
my $substr = shift;
my $count = 0;
for (my $pos=0; $pos = index($string,$substr,$pos), $pos != -1;$pos
+= length($substr), $count++) {}
return $count;
}
sub count3 {
my $string = shift;
my $substr = shift;
my $length = length($substr);
my $count = 0; my $pos;
while ( ($pos = index($string,$substr,$pos)) != -1) {
$pos += $length;
$count++;
}
return $count;
}
# Ron's variation
sub count4 {
my $string = shift;
my $substr = shift;
my $count = 0;
$count++ while $string =~ m/$substr/g;
return $count;
}
# original
sub count5 {
my $string = shift;
my $substr = shift;
my $count = 0;
$string =~ s/($substr)/$count++;$1/egs;
return $count;
}
my $x = 'abc123abc456abc789abc';
my $s = 'abc';
print "Returned counts: \n",
'Count1 => ', count1($x,$s), "\n",
'Count2 => ', count2($x,$s), "\n",
'Count3 => ', count3($x,$s), "\n",
'Count4 => ', count4($x,$s), "\n",
'Count5 => ', count5($x,$s), "\n";
cmpthese(10000, {
'Count1' => sub {count1($x,$s)},
'Count2' => sub {count2($x,$s)},
'Count3' => sub {count3($x,$s)},
'Count4' => sub {count4($x,$s)},
'Count5' => sub {count5($x,$s)}
});
% perl string_count.pl
Returned counts:
Count1 => 4
Count2 => 4
Count3 => 4
Count4 => 4
Count5 => 4
Rate Count5 Count1 Count4 Count2 Count3
Count5 5043/s -- -46% -71% -78% -80%
Count1 9328/s 85% -- -46% -59% -63%
Count4 17212/s 241% 85% -- -24% -31%
Count2 22727/s 351% 144% 32% -- -9%
Count3 24938/s 395% 167% 45% 10% --
_______________________________________________
Boston-pm mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/boston-pm