Re: regexp quickie

Ronald J Kimball Wed, 27 Jun 2007 08:01:48 -0700

On Wed, Jun 27, 2007 at 05:45:54AM -0700, Phil Carmody wrote:
> Say I had a string satisfying /^[A-Z_]{6}$/, but not equal to '______'
> and I wish to extract from that the 1 or 2 letters which are closest to
> the n-th character in the string. Is there a simple regexp to perform
> that task?
> 
> e.g.
> if the string=A_Z_K_ then:
> if n=1, then I want 'A' (or 'AA', not fussed)
> if n=2, then I want 'AZ'
> if n=3, then I want 'Z' (or 'ZZ', not fussed)
> if n=4, then I want 'ZK'
> if n=5 or 6, then I want 'K' (or 'KK', not fussed)
> 
> I can see how to do it with the concatenation of two matches from two
> substrs, but that's barely simpler than a naive loop over each character
> forwards and backwards.


Well, I wouldn't exactly call this regex simple...  But I have come up with
one that does it:

for (qw/ A_Z_K_ A_____ _____K /) {
  print "$_\n";
  for my $n (1 .. 6) {
    my $r = $n - 1;
    print "$n: ";
    /^(?(?=.{0,$r}[A-Z]).{0,$r}|.*)([A-Z])(?(?<!^..{$r}).*?([A-Z]|$))/
      && print "$1 $2";
    print "\n";
  }
}

This has the advantage of always putting the matched characters in $1 and
$2.  (Note that $1 is always set; if there is no letter at or before the
position, $1 will contain the first letter after the position and $2 will
be empty.)


Here are two other approaches:

/^.{$r}([A-Z])/ || /^.{0,$r}([A-Z]).*?([A-Z]|$)/ || /^.*([A-Z])/
  && print "$1 $2";
is simpler, but uses three separate regular expressions.

/^.{$r}([A-Z])|^.{0,$r}([A-Z]).*?([A-Z]|$)|^.*([A-Z])/
  && print $1 || $4 || "$2 $3";
uses a single regular expression, but the results will be in $1, or in $2
and $3, or in $4.  (And if digits were allowed the print logic would need
to be modified.)


Ronald

Re: regexp quickie

Reply via email to