In article <[EMAIL PROTECTED]>, 
Robin Garbutt wrote:

> Hi all,
> 
> I have a string that is a random sequence like the following:-
> 
> ACGTCGTCGTCACACACACGCGTCTCTATACGCG
> 
> I want to be able to parse the string, picking out any TATA sequences,
> colour them in red and make a not of where ther lie in the sequence.
> 
> Is this possible with perl?

And much more (though not necessarily from me ;-))

Here is my version using a terminal window and with output something like 
this (hits in red):
Line# : Char# : Matches
    1 :    11 : AGTGTAGAGTTCTTCATTTTTTACGGACGGTCCGACCGCTGGATCTAGAG
    1 :    44 : AGTGTAGAGTTCTTCATTTTTTACGGACGGTCCGACCGCTGGATCTAGAG
    5 :     7 : CTGTATTCTTGAAAGTCCCCCAGCATCCAGGCCATTATCGAATATCGACT
    6 :     3 : TTTCTTGCAAGTTAATGGTAGACCTACAGTTGGGGAACTGAGTATCCCAG

Notice I print same-line multiple hits on separate lines. I suppose the 
fancier format would be something like:

Line# : Char# : Matches
    1 : 11,44 : AGTGTAGAGTTCTTCATTTTTTACGGACGGTCCGACCGCTGGATCTAGAG
    5 :     7 : CTGTATTCTTGAAAGTCCCCCAGCATCCAGGCCATTATCGAATATCGACT
    6 :     3 : TTTCTTGCAAGTTAATGGTAGACCTACAGTTGGGGAACTGAGTATCCCAG

I used the substr function (then afterwards remembered that index might be 
better/easier for this); I also imagine that the slicker way to do this is 
probably with regexes (cue, John Krahn one-liner enters from stage 
left...). ;-)

-K (as always, advice, criticism welcome)


#!/usr/bin/perl
use warnings;
use strict;

# find_substring

# I have a string that is a random sequence like the following:-
#
# ACGTCGTCGTCACACACACGCGTCTCTATACGCG
#
# I want to be able to parse the string, picking out any TATA sequences,
# colour them in red and make a not of where ther lie in the sequence.

while (@ARGV) {

   my $sequence = 'TATA';                       #what we are looking for

   my $data = shift;
   open FH, "< ", $data
     or die "Couldn't open datafile $data for reading: $!\n";

   printf "\nLine# : Char# : Matches\n";        # print heading

   while (<FH>) {
      chomp;
      print matches($., $_, $sequence);
   }
}

# end  main #
# begin sub #

sub matches{
   my @matches;
   my ($line_nbr, $line, $seq) = @_;

   for (0 .. (length($line) - length($seq)) ) {
      my $char_position = $_;
      my $substring = substr $line, $char_position, length $seq;
      if ($substring eq $seq) {
         my $hilite_line = hilite($line, $char_position, $seq);
         $_++;                          # add 1 to char position
         push @matches, sprintf "%5d : %5d : %s\n", $line_nbr, $_, 
$hilite_line;
      }
   }
   return @matches;
}

sub hilite {
   my $color_on = "\e[31;1m";
   my $color_off = "\e[0m";
   my ($line, $char_pos, $seq) = @_;
   substr($line, $char_pos, length($seq), "$color_on$seq$color_off");
   return $line;
}
## end ##


-- 
Kevin Pfeiffer
International University Bremen

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to