Well this is the final code I put together with everyones help from this group:
#!/usr/bin/perl use warnings; use strict; print "Enter the path of the INFILE to be processed:\n"; chomp (my $infile = <STDIN>); open(INFILE, $infile) or die "Can't open INFILE for input: $!"; print "Enter in the path of the OUTFILE:\n"; chomp (my $outfile = <STDIN>); open(OUTFILE, ">$outfile") or die "Can't open OUTFILE for input: $!"; print "Enter in the LENGTH you want the sequence to be:\n"; chomp (my $len = <STDIN>); my ($name, @seq); while ( <INFILE> ) { chomp; unless ( /^\s*$/ or s/^\s*>(.+)// ) { $name = $1; my @char = ( split( // ), ( '-' ) x ( $len - length ) ); push @seq, ' '."@char $name"; } } { local $" ="\n"; print OUTFILE "R 1 [EMAIL PROTECTED]"; # The top of the file is supposed } close INFILE; close OUTFILE; Basically it will take this file: >dog atcgc >cat atcgctac >mouse agctata and turn it into this: R 1 10 a t c g c - - - - - dog a t c g c t a c - - cat a g c t a t a - - - mouse However, I forgot that sometime the imput data is like this: >dog agatgtagt agtggttga agggagc >cat gcatcgatg agcatatgc >mouse actagcatc acgtacgat That is the sequence of letters can span multiple lines. I would like the above script to handle input data that can possibly span several lines as well as those that do not. and output as mentioned above. You all have been much help! I have really learned a lot with the help you've given so far! -Thanks! -Mike In article <[EMAIL PROTECTED]>, [EMAIL PROTECTED] (David Wall) wrote: > --On Monday, August 25, 2003 6:50 PM -0400 Mike Robeson > <[EMAIL PROTECTED]> wrote: > > > OK, I feel like an idiot. When I initially asked for help with this I > > just realized that I forgot two little details. I was supposed to add > > the number of sequences as well as the length of the sequences at the > > top of the output file. > > > > That is this file: > > > >> dog > > agatagatcgcatcga > >> cat > > acgcttcgatacgctagctta > >> mouse > > agatatacgggtt > > > > is relly supposed to be: > > > > 3 22 > > a g a t a g a t c g c a t c g a - - - - - - dog > > a c g c t t c g a t a c g c t a g c t t a - cat > > a g a t a t a c g g g t t - - - - - - - - - mouse > > > > The '3' represents the number of individual sequences in the file (i.e. > > dog, cat, mouse). And the 22 is the number of letters and dashes there > > are. The length is already in the script as $len. I am able to get the > > length listed at the top. However, I cannot find a way to have the > > number of sequences (the 3 in this case) printed to the top. > > Here's one way (slightly altering John's solution), but it will use lots of > memory if the sequences are long. > > > #!/usr/bin/perl > use warnings; > use strict; > > my ($name, $num_seq, @seq); > my $len = 30; > while ( <DATA> ) { > unless ( /^\s*$/ or s/^\s*>(\S+)// ) { > my $name = $1; > my @char = ( /[acgt]/g, ( '-' ) x $len )[ 0 .. $len - 1 ]; > push @seq, "@char $name"; > $num_seq++; > } > } > { > local $" ="\n"; > print "[EMAIL PROTECTED]"; > } > > __DATA__ > > dog > agatagatcgcatcga > > cat > acgcttcgatacgctagctta > > mouse > agatatacgggt -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]