On Mon, Oct 1, 2012 at 5:15 PM, Florian Huber <florian_hu...@gmx.at> wrote: > > My confusion was complete when I tried > > $string =~ /[ACGT]{5}/; > > now it matches 5 letters, but this time from the beginning, i.e.: ACGAC.
>I'm trying to extract a DNA sequence out of a larger string, i.e. the string >is of the following structure: $string = "/NOTNEEDED/*ACGACGGGTTCAAGGCAG*/NOTNEEDED/" > But when I do $string =~ /[ACGT]/; > it matches only the last letter, i.e. "G". Why doesn't it start at the > beginning? $ cat /tmp/g.pl $string = "/NOTNEEDED/*ACGACGGGTTCAAGGCAG*/NOTNEEDED/"; ## match and replace w/ and 'X' if ( $string =~ s/([ACGT])/X/ ) { print "Matched: $1 in $string\n"; } Macintosh-3:~ afbach$ perl /tmp/g.pl Matched: T in /NOXNEEDED/*ACGACGGGTTCAAGGCAG*/NOTNEEDED/ The square brackets create a character class "either A or C or G or T" and take up one position. "*" makes it zero or more, "+" one or more and "{5}" means "exactly 5" but of any of those. If I'm understanding you, you want the sequence of [ACGT]s demarked by non-ACGTs. While your example has the /* ... */ markers (so if ( $string =~ m#/\*([ACGT]+)\*/# ) { would work) I doubt that's your data. Is the string you want the sequence of only ACGTs? This sort of works: if ( $string =~ m/[^ACGT]([ACGT]+)[^AGCT]/ ) { print "Matched: $1 in $string\n"; } but .... -- a Andy Bach, afb...@gmail.com 608 658-1890 cell 608 261-5738 wk -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/