On Mon, Oct 1, 2012 at 5:15 PM, Florian Huber <florian_hu...@gmx.at> wrote:
>
> My confusion was complete when I tried
>
> $string =~ /[ACGT]{5}/;
>
> now it matches 5 letters, but this time from the beginning, i.e.: ACGAC.

>I'm trying to extract a DNA sequence out of a larger string, i.e. the string 
>is of the following structure:
$string = "/NOTNEEDED/*ACGACGGGTTCAAGGCAG*/NOTNEEDED/"
> But when I do
$string =~ /[ACGT]/;
> it matches only the last letter, i.e. "G". Why doesn't it start at the 
> beginning?

$ cat /tmp/g.pl
$string = "/NOTNEEDED/*ACGACGGGTTCAAGGCAG*/NOTNEEDED/";

## match and replace w/ and 'X'
if ( $string =~ s/([ACGT])/X/ ) {
  print "Matched: $1 in $string\n";
}
Macintosh-3:~ afbach$ perl /tmp/g.pl
Matched: T in /NOXNEEDED/*ACGACGGGTTCAAGGCAG*/NOTNEEDED/

The square brackets create a character class "either A or C or G or T"
and take up one position. "*" makes it zero or more, "+" one or more
and "{5}" means "exactly 5" but of any of those.

If I'm understanding you, you want the sequence of [ACGT]s demarked by
non-ACGTs.   While your example has the /* ... */ markers (so
if ( $string =~ m#/\*([ACGT]+)\*/# ) {

would work) I doubt that's your data. Is the string you want the
sequence of only ACGTs? This sort of works:
if ( $string =~ m/[^ACGT]([ACGT]+)[^AGCT]/ ) {
  print "Matched: $1 in $string\n";
}

but ....


-- 

a

Andy Bach,
afb...@gmail.com
608 658-1890 cell
608 261-5738 wk

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to