On 27/08/2013 16:18, Niels Larsen wrote:
Neils: Re: 'Most genome packages use it': can you specify? Most genome
packages I know allow the flexibility to use standard line-wrapped FASTA
as well, so coding an indexing scheme for a non-standard FASTA alone
seems… tricky. Unless you intend on allowing both, and 'unwrapped' is
just for optimization.
C Feilds: Yes, read both but write unwrapped (by default) so that steps
in
a workflow can use the faster unwrapped format. Read routines that
"taste"
the file by looking at the first record and derive the format, are much
faster
than when reading/writing wrapped. And less work for the user/caller.
Ah, but can you trust the first record? If it is a relatively short
sequence it may be on one line, but later sequences may wrap. Depends on
the record limit.
As to the format name .... a name beginning 'fasta-' would be easiest to
document. For FASTQ we used fastq, fastq-sanger, fastq-solexa, and so on.
regards,
Peter Rice
EMBOSS Team
_______________________________________________
EMBOSS mailing list
[email protected]
http://lists.open-bio.org/mailman/listinfo/emboss