On 20/07/10 17:27, Peter C. wrote: > Hi all, > > Is there a tool in EMBOSS to just count the number of sequences in a file? > > Right now I could handle this by using seqret to convert the file into FASTA > and then pipe that though grep to count the records. But an EMBOSS tool > would be more elegant, e.g. > > $ countseq -sformat=genbank gbvrt1.seq > 31065 > > For the implementation you might offer the choice between using the normal > EMBOSS parsing (as in seqret) versus file format specific regular expression > searches which just look for marker lines (without checking validity) which > should be really fast.
Very easy to write ... you could do it yourself for practise (we will help of course). Just use seqret as the basis, don't write any sequences out, but add an outfile for the results. We will add countseq to the next release. regards, Peter Rice _______________________________________________ EMBOSS mailing list [email protected] http://lists.open-bio.org/mailman/listinfo/emboss
