On Wed, Jun 27, 2007 at 05:45:54AM -0700, Phil Carmody wrote:
> Say I had a string satisfying /^[A-Z_]{6}$/, but not equal to '______'
> and I wish to extract from that the 1 or 2 letters which are closest to
> the n-th character in the string. Is there a simple regexp to perform
> that task?
>
> e.g.
> if the string=A_Z_K_ then:
> if n=1, then I want 'A' (or 'AA', not fussed)
> if n=2, then I want 'AZ'
> if n=3, then I want 'Z' (or 'ZZ', not fussed)
> if n=4, then I want 'ZK'
> if n=5 or 6, then I want 'K' (or 'KK', not fussed)
>
> I can see how to do it with the concatenation of two matches from two
> substrs, but that's barely simpler than a naive loop over each character
> forwards and backwards.
Well, I wouldn't exactly call this regex simple... But I have come up with
one that does it:
for (qw/ A_Z_K_ A_____ _____K /) {
print "$_\n";
for my $n (1 .. 6) {
my $r = $n - 1;
print "$n: ";
/^(?(?=.{0,$r}[A-Z]).{0,$r}|.*)([A-Z])(?(?<!^..{$r}).*?([A-Z]|$))/
&& print "$1 $2";
print "\n";
}
}
This has the advantage of always putting the matched characters in $1 and
$2. (Note that $1 is always set; if there is no letter at or before the
position, $1 will contain the first letter after the position and $2 will
be empty.)
Here are two other approaches:
/^.{$r}([A-Z])/ || /^.{0,$r}([A-Z]).*?([A-Z]|$)/ || /^.*([A-Z])/
&& print "$1 $2";
is simpler, but uses three separate regular expressions.
/^.{$r}([A-Z])|^.{0,$r}([A-Z]).*?([A-Z]|$)|^.*([A-Z])/
&& print $1 || $4 || "$2 $3";
uses a single regular expression, but the results will be in $1, or in $2
and $3, or in $4. (And if digits were allowed the print logic would need
to be modified.)
Ronald