Re: another parsing script :)

John SJ Anderson Fri, 13 May 2011 09:45:04 -0700

On Fri, May 13, 2011 at 11:46, Nathalie Conte <n...@sanger.ac.uk> wrote:
> I have a file with sequences each sequence is 200 pb long and I have 30K
> lines
>
> ATGGATAGATA\n
> TTCGATTCATT\n
> GCCTAGACAT\n
> TTGCATAGACTA\n
> I want to calculate the AT ratio of each  base based on their position
>  (3/4) for the 1st position, 3/4 on the second, (0/4) on the 3rd...
[ snip ]
> foreach my $line (@list){
>    chomp $line;
>     my @pb = split(/\d/, $line);
>   my @position = $pb[0]; for the fisrt position
>       $line++;
>
> do that in a loop 200 times ( as we have 200 pb per sequence) which will
> create 200 arrays with 30K digits in them. I would need an array of all
> arrays at that point???


You don't need to do it 200 times; you can loop over the file once.
Something like this (caveat, untested):

my( @AT_count , $total );

open( my $FH , '<' , $file ) or die( $! );
while( <$FH> ) {
  my @bases = split // , $_;
  foreach my $idx ( 0 .. $#bases ) {
    $AT_count[$idx]++ if $bases[$idx] eq 'A' or $bases[$idx] eq 'T';
    $total++;
  }
}
close( $FH );

foreach my $position ( 0 .. $#AT_count) {
  printf "%.2f%% AT at position %d\n" , $AT_count[$position] / $total
* 100 , $position + 1;
}

Basically you're going to keep an array where each element corresponds
to the number of A or T residues at a given position across all your
sequences. So the first element is the total number of [AT] residues
in the first column, second element is [AT] in the second column, and
so on.

Read each line in the file, split into individual bases, loop over
each base, and if it is an A or a T, increment the corresponding
position in the array.

At the end, loop over your count array and convert each element into a
percentage and print it out.

NOTE: If some of the sequences are different lengths, you'll need to
store the total number of positions in an array too. You said they
were all 200bp long, so I only used a single counter for the total.

chrs,
john.

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: another parsing script :)

Reply via email to