Re: [EMBOSS] fasta single-line sequence format?

Peter Rice Tue, 27 Aug 2013 09:07:50 -0700

On 27/08/2013 16:18, Niels Larsen wrote:

Neils: Re: 'Most genome packages use it': can you specify?  Most genome
packages I know allow the flexibility to use standard line-wrapped FASTA
as well, so coding an indexing scheme for a non-standard FASTA alone
seems… tricky.  Unless you intend on allowing both, and 'unwrapped' is
just for optimization.

C Feilds: Yes, read both but write unwrapped (by default) so that steps
in
a workflow can use the faster unwrapped format. Read routines that
"taste"
the file by looking at the first record and derive the format, are much
faster
than when reading/writing wrapped. And less work for the user/caller.

Ah, but can you trust the first record? If it is a relatively shortsequence it may be on one line, but later sequences may wrap. Depends onthe record limit.

As to the format name .... a name beginning 'fasta-' would be easiest todocument. For FASTQ we used fastq, fastq-sanger, fastq-solexa, and so on.


regards,

Peter Rice
EMBOSS Team
_______________________________________________
EMBOSS mailing list
[email protected]
http://lists.open-bio.org/mailman/listinfo/emboss

Re: [EMBOSS] fasta single-line sequence format?

Reply via email to