John W. Krahn wrote:
> Rob Dixon wrote:
> >
> > John W. Krahn wrote:
> > >
> > > Better to use zero-width positive look-ahead and look-behind.
> > >
> > > $ perl -le'
> > > $_ = ":B000:L520:M260:M:88:8:M602:";
> > > $string_to_match = qr/\w+/;
> > > $count = () = /(?<=:)$string_to_match(?=:)/g;
> > > print $count;
> > > '
> > > 7
> > > $ perl -le'
> > > $_ = ":B000:L520:M260:M:88:8:M602:";
> > > $string_to_match = "";
> > > $count = () = /(?<=:)$string_to_match(?=:)/g;
> > > print $count;
> > > '
> > > 0
> >
> > This isn't a problem if $string_to_match is non-null, but
> > only if it is a regex or, more specifically, contains regex
> > metacharacters. If this is the case I would personally
> > rather go for:
> >
> >     $count = () = /\W\Q$string_to_match\E\W/g;
> >
> > but my original code was written on the assumption that
> > Scott's substrings were always alphanumeric, as per his
> > sample. Since he has since posted that the solution was
> > effective I think I was right, but thanks for pointing this
> > out.
>
> What I was pointing out is that \W "eats" a character on either side of
> $string_to_match while the zero-width assertions do not.  So assuming:
>
> $_ = ':XXX:XXX:YYY:YYY:XXX:XXX:';
> $string_to_match = 'YYY';
>
> Using \W to anchor the match will only match once:
>
> > XXX:XXX:YYY:YYY:XXX:XXX:
>         ^^^^^
>      first match

[snip]

Ah, the light dawns, thank you John! You'd better make some changes
after all Scott. The effect my mistake will have is to miss the second
of a pair of identical substrings. This would have gone unnoticed for
a very long time as it is probably very unusual.

One refinement that I would go for is to avoid the look-behind, which
can be expensive in a regex (although probably not in this context).

Instead of this:
    my @match = /\W($string_to_match)\W/g;
use this:
    my @match = /\W($string_to_match)(?=\W)/g;

    $_ = ':XXX:XXX:YYY:YYY:XXX:XXX:';
    $string_to_match = 'XXX';
    my @match = /\W($string_to_match)(?=\W)/g;
    my $count = @match;
    print $count;

output

    4

Cheers,

Rob




-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to