Suggestions please for a format name to describe fasta format with the
sequence always on a single line
(needed for output only - it will be valid as format 'fasta' for input).
Peter Rice
EMBOSS Team
On 27/08/2013 11:03, Niels Larsen wrote:
Yes, i meant both input and output. It would not be default, so
hopefully no programs should get a long-line surprise. The speed
advantage is a single read for the whole sequence and not having
to remove newlines. Indexing sub-sequences with locators
becomes straightforward, the newlines don't get in the way. Most
genome packages use it, i think, including mine. Thanks, yes i
thought it must be quite easy to do ..
Niels
On Tue, 2013-08-27 at 10:41 +0100, Peter Rice wrote:
On 27/08/2013 09:40, Niels Larsen wrote:
EMBOSS list,
I could not find a fasta single-line sequence format, is it
missing? having the sequence as a single line does not
violate fasta format i think, and many programs use it
because of speed and indexing convenience.
You mean as an output format I assume? (it would be no problem for input).
Easy to implement, but needs a name so you can so specify
-osformat fastasingle (for example)
It can also be an issue for applications that fail to check for very
long input lines.
I don't see any real benefit for indexing - you only need to point to
the start of the ID line for that. Maybe there are applications that map
the sequence string and want to have no extra characters.
regards,
Peter Rice
EMBOSS Team
_______________________________________________
EMBOSS mailing list
[email protected]
http://lists.open-bio.org/mailman/listinfo/emboss