Well this is the final code I put together with everyones help from this 
group:

#!/usr/bin/perl
use warnings;
use strict;

print "Enter the path of the INFILE to be processed:\n";

chomp (my $infile = <STDIN>);

open(INFILE, $infile)
      or die "Can't open INFILE for input: $!";

print "Enter in the path of the OUTFILE:\n";

chomp (my $outfile = <STDIN>);

open(OUTFILE, ">$outfile")
      or die "Can't open OUTFILE for input: $!";

print "Enter in the LENGTH you want the sequence to be:\n";

chomp (my $len = <STDIN>);

my ($name, @seq);
while ( <INFILE> ) {
    chomp;
    unless ( /^\s*$/ or s/^\s*>(.+)// ) {
        $name = $1;   
        my @char = ( split( // ), ( '-' ) x ( $len - length ) ); 
        push @seq, ' '."@char       $name";
        }
    }

{
   local $" ="\n";
   print OUTFILE "R 1 [EMAIL PROTECTED]"; # The top of the file is 
supposed
                             
}

close INFILE;
close OUTFILE;



Basically it will take this file:

>dog
atcgc
>cat
atcgctac
>mouse
agctata


and turn it into this:
R 1 10
 a t c g c - - - - -       dog
 a t c g c t a c - -       cat
 a g c t a t a - - -       mouse

However, I forgot that sometime the imput data is like this:

>dog
agatgtagt
agtggttga
agggagc
>cat
gcatcgatg
agcatatgc
>mouse
actagcatc
acgtacgat

That is the sequence of letters can span multiple lines. I would like 
the above script to handle input data that can possibly span several 
lines as well as those that do not. and output as mentioned above.

You all have been much help! I have really learned a lot with the help 
you've given so far!

-Thanks!
-Mike



In article <[EMAIL PROTECTED]>,
 [EMAIL PROTECTED] (David Wall) wrote:

> --On Monday, August 25, 2003 6:50 PM -0400 Mike Robeson 
> <[EMAIL PROTECTED]> wrote:
> 
> > OK, I feel like an idiot. When I initially asked for help with this I
> > just realized that I forgot two little details. I was supposed to add
> > the number of sequences as well as the length of the sequences at the
> > top of the output file.
> >
> > That is this file:
> >
> >> dog
> > agatagatcgcatcga
> >> cat
> > acgcttcgatacgctagctta
> >> mouse
> > agatatacgggtt
> >
> > is relly supposed to be:
> >
> > 3     22
> > a g a t a g a t c g c a t c g a - - - - - -    dog
> > a c g c t t c g a t a c g c t a g c t t a -    cat
> > a g a t a t a c g g g t t - - - - - - - - -    mouse
> >
> > The '3' represents the number of individual sequences in the file (i.e.
> > dog, cat, mouse). And the 22 is the number of letters and dashes there
> > are. The length is already in the script as $len. I am able to get the
> > length listed at the top. However, I cannot find a way to have the
> > number of sequences (the 3 in this case) printed to the top.
> 
> Here's one way (slightly altering John's solution), but it will use lots of 
> memory if the sequences are long.
> 
> 
> #!/usr/bin/perl
> use warnings;
> use strict;
> 
> my ($name, $num_seq, @seq);
> my $len = 30;
> while ( <DATA> ) {
>     unless ( /^\s*$/ or s/^\s*>(\S+)// ) {
>         my $name = $1;
>         my @char = ( /[acgt]/g, ( '-' ) x $len )[ 0 .. $len - 1 ];
>         push @seq, "@char    $name";
>         $num_seq++;
>     }
> }
> {
>     local $" ="\n";
>     print "[EMAIL PROTECTED]";
> }
> 
> __DATA__
>  > dog
> agatagatcgcatcga
>  > cat
> acgcttcgatacgctagctta
>  > mouse
> agatatacgggt

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to