> Ah, but can you trust the first record? If it is a relatively short 
> sequence it may be on one line, but later sequences may wrap. Depends on 
> the record limit.
> 
> As to the format name .... a name beginning 'fasta-' would be easiest to 
> document. For FASTQ we used fastq, fastq-sanger, fastq-solexa, and so on.
> 

Indeed file sampling isn't water-tight, but i still think the
programmatic
equivalent of this:  head -n 2000 file | grep '^>' | wc --lines (where
the 
output number is 1000 if unwrapped) is much faster than being water
tight and bulletproof, given the very large files being handled. Besides
when did i last see a fasta file with 580 long sequence lines .. can't 
think of it. 

Niels L

> regards,
> 
> Peter Rice
> EMBOSS Team

_______________________________________________
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss

Reply via email to