> Ah, but can you trust the first record? If it is a relatively short > sequence it may be on one line, but later sequences may wrap. Depends on > the record limit. > > As to the format name .... a name beginning 'fasta-' would be easiest to > document. For FASTQ we used fastq, fastq-sanger, fastq-solexa, and so on. >
Indeed file sampling isn't water-tight, but i still think the programmatic equivalent of this: head -n 2000 file | grep '^>' | wc --lines (where the output number is 1000 if unwrapped) is much faster than being water tight and bulletproof, given the very large files being handled. Besides when did i last see a fasta file with 580 long sequence lines .. can't think of it. Niels L > regards, > > Peter Rice > EMBOSS Team _______________________________________________ EMBOSS mailing list EMBOSS@lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/emboss