Re: [MacPerl] multi files to multi arrays

Ken Williams Tue, 10 Apr 2001 22:55:54 -0700
Hi Eric,

A couple of suggestions follow for how to make this code a bit more
idiomatic (I had kind of a tough time reading it).

1) use chomp() instead of chop().  In truth it doesn't matter much, but
people will think you don't know chomp() if you're using chop() without
a good reason.

2) turn on strict syntax by putting "use strict;" at the top of the
script.  For one thing, this would find the fact that you're using $n
(lower-case) only once in the script, which is probably an error.  It
will also make you declare all your variables, which in the end is a
good thing because it helps prevent these errors.

3) The code
     $seq .= $_ if $_ !~/\>/;
   can be written as
     $seq .= $_ unless /\>/;
   which helps its readability.

4) Better to use 'my' than 'local'.  Using strict syntax will enforce
this policy.     

5) The indentation of your 'while' and 'for' loops is off, which makes
it hard to follow the nesting.

6) You have the line
     open (FD,"<$fname") || die "cannot open $fname!\n";
   which I see sometimes.  It's more useful to do
     open (FD,"<$fname") || die "cannot open $fname: $!\n";
   because it will give you a better error message, and it's a bit 
   more polite.


Now, some questions:

1) Is the line
   $hash = $hash+$fname;
intended to set $hash to 1, then 3, then 6?  That's what it's doing,
because the addition converts $fname to a number.  So it's undef+1=1,
then 1+2=3, then 3+3=6.

2) Also, the variable $hash is never used later in your program.  It is
distinct from the variable %hash, which appears later.

3) I'm not following the intended behavior of the interior loops.  Do
you need to reset the $seq variable?  Perhaps you could describe what
you're trying to do.  Are you simply counting character n-grams (where
n=20) in the three files?  There are probably simpler ways to do that. 
One way follows.

==========================================================
use strict;
my $N = 20;

@ARGV = ("1.fsa", "2.fsa", "3.fsa");
my %hash;

foreach my $fname ( @ARGV )
{
  open (FD,"<$fname") || die "cannot open $fname!\n";
  my $text = do {local $/; <FD>};  # An idiom for slurping whole files
  $text =~ s/^>.*//mg;  # Strip header lines
  $text =~ s/\n//g;     # Strip newlines
  close FD;
  
  while ($text =~ /(.{$N})/go) {
    $hash{$fname}{$1}++;  # Increment counter for this n-gram in this file
    pos($text) -= $N-1;   # Back up the match position
  }
}

foreach my $file (@ARGV) {
  foreach my $ngram (keys %{$hash{$file}}) {
    print "$ngram: $hash{$file}{$ngram}\n";
  }
}
==========================================================


[EMAIL PROTECTED] (Eric W Dahlstrom) wrote:
>Hi,
>I am a newbie to Perl and I am trying to bring several files into their own
>separate arrays for processing in a single script.
>Currently I am able to read in the files, but they all dump into a single
>array.  I need to create a new array for each file, but I am not sure how
>this is done.
>Is there an easy way to increment the array for each file?
>
>Thank You.
>
>
>
>
>$N = 20; 
>
>@ARGV = ("1.fsa", "2.fsa", "3.fsa");
>
>
>foreach $fname ( @ARGV )
> {
>   local $seq;
>   open (FD,"<$fname") || die "cannot open $fname!\n";
>   $hash = $hash+$fname;
>   while ( <FD> )     {
>    
>       chop;    # drop newline at end of line
>       $seq .= $_ if $_ !~/\>/;    # ignore header
>       
>       }
>       
>       for ($i=0; $i+$N < length($seq); $i++) {
>           $n++;
>           $ngram = substr($seq,$i,$N);
>           $hash{$ngram}++;
>           print $hash{$ngram};
>       
>     
>   
> }  
>   
>   $name[$namcnt++] = $fname;
>
>   close ( FD );   
> }
>
>
>E W Dahlstrom                  [EMAIL PROTECTED]
>Novato Ca
>
>
>
>

  -------------------                            -------------------
  Ken Williams                             Last Bastion of Euclidity
  [EMAIL PROTECTED]                            The Math Forum
Re: [MacPerl] multi files to multi arrays

Reply via email to