In article <[EMAIL PROTECTED]>, Robin Garbutt wrote: > Hi all, > > I have a string that is a random sequence like the following:- > > ACGTCGTCGTCACACACACGCGTCTCTATACGCG > > I want to be able to parse the string, picking out any TATA sequences, > colour them in red and make a not of where ther lie in the sequence. > > Is this possible with perl?
And much more (though not necessarily from me ;-)) Here is my version using a terminal window and with output something like this (hits in red): Line# : Char# : Matches 1 : 11 : AGTGTAGAGTTCTTCATTTTTTACGGACGGTCCGACCGCTGGATCTAGAG 1 : 44 : AGTGTAGAGTTCTTCATTTTTTACGGACGGTCCGACCGCTGGATCTAGAG 5 : 7 : CTGTATTCTTGAAAGTCCCCCAGCATCCAGGCCATTATCGAATATCGACT 6 : 3 : TTTCTTGCAAGTTAATGGTAGACCTACAGTTGGGGAACTGAGTATCCCAG Notice I print same-line multiple hits on separate lines. I suppose the fancier format would be something like: Line# : Char# : Matches 1 : 11,44 : AGTGTAGAGTTCTTCATTTTTTACGGACGGTCCGACCGCTGGATCTAGAG 5 : 7 : CTGTATTCTTGAAAGTCCCCCAGCATCCAGGCCATTATCGAATATCGACT 6 : 3 : TTTCTTGCAAGTTAATGGTAGACCTACAGTTGGGGAACTGAGTATCCCAG I used the substr function (then afterwards remembered that index might be better/easier for this); I also imagine that the slicker way to do this is probably with regexes (cue, John Krahn one-liner enters from stage left...). ;-) -K (as always, advice, criticism welcome) #!/usr/bin/perl use warnings; use strict; # find_substring # I have a string that is a random sequence like the following:- # # ACGTCGTCGTCACACACACGCGTCTCTATACGCG # # I want to be able to parse the string, picking out any TATA sequences, # colour them in red and make a not of where ther lie in the sequence. while (@ARGV) { my $sequence = 'TATA'; #what we are looking for my $data = shift; open FH, "< ", $data or die "Couldn't open datafile $data for reading: $!\n"; printf "\nLine# : Char# : Matches\n"; # print heading while (<FH>) { chomp; print matches($., $_, $sequence); } } # end main # # begin sub # sub matches{ my @matches; my ($line_nbr, $line, $seq) = @_; for (0 .. (length($line) - length($seq)) ) { my $char_position = $_; my $substring = substr $line, $char_position, length $seq; if ($substring eq $seq) { my $hilite_line = hilite($line, $char_position, $seq); $_++; # add 1 to char position push @matches, sprintf "%5d : %5d : %s\n", $line_nbr, $_, $hilite_line; } } return @matches; } sub hilite { my $color_on = "\e[31;1m"; my $color_off = "\e[0m"; my ($line, $char_pos, $seq) = @_; substr($line, $char_pos, length($seq), "$color_on$seq$color_off"); return $line; } ## end ## -- Kevin Pfeiffer International University Bremen -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]