On 13/05/2011 16:46, Nathalie Conte wrote:

I have a file with sequences each sequence is 200 pb long and I have 30K
lines

ATGGATAGATA\n
TTCGATTCATT\n
GCCTAGACAT\n
TTGCATAGACTA\n

Does your data look like this? With 10, 11, or 12 characters per line?
I'm afraid I don't know what a pb is, are you saying that each line is 200 characters long?

I want to calculate the AT ratio of each  base based on their position
(3/4) for the 1st position, 3/4 on the second, (0/4) on the 3rd...
I am beginner so please excuse my perl thinking!

my plan was to put everything in arrays, split on the digit and then
for each line put the 1st digit in another array,
my $fh ="./txt" ;
unless (open(REGIONS, $fh)){
        print "Cannot open file \n";
}

OK, this has been mentioned before, but you should at least die instead
of just printing an error and continuing. The error message should
include the $! built-in variable, and ideally you would also use a
lexical file handle and the three-parameter form of open. Idiomatic Perl
would look like this:

  my $filename ="./txt";
  open my $regions, '<', $filename or die "Cannot open file: $!";


my @list = <REGIONS>;
close REGIONS;

Instead of reading the entire file into memory, especially with the
amount of data you have, you should read and process the file one line
at a time:

  while (my $line = <$regions>) {
    chomp $line;
    :
  }

foreach my $line (@list){
     chomp $line;
      my @pb = split(/\d/, $line);
    my @position = $pb[0]; for the fisrt position
        $line++;

I'm afraid I don't follow your code. Although I can see that it
corresponds to your design above, there are no digits in the sample data
you show. Also, you are incrementing $line at the end, which it the most
recent line read from the file.

do that in a loop 200 times ( as we have 200 pb per sequence) which will
create 200 arrays with 30K digits in them. I would need an array of all
arrays at that point???

from them use a condition loop assessing the A or T compo  for each
array in the big array , count them with a counter and divide by the
size of each array.

Could you please help me with this?

I think the problem isn't a difficult one, but I am having problems
understanding what you need to do. Could you post a reasonable sample of
data and the corresponding output that you require? Perhaps it would
help to explain what pb, AT ratio, base, and so on mean in terms of the
data in the file.

Cheers,

Rob

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to