On 12/10/2012 22:27, Rozenbaum, Daniel (Biocceleration Inc) wrote:
Hello everyone,

We have encountered the following issue: if there's an erroneous (most likely unintentionally) entry in a 
list file that looks like "db:<space character>seqname", EMBOSS doesn't issue an 
error/warning message, but treats this entry as "db:*". >

Might it be possible though to add some protection against potentially 
problematic consequences if such an error in the USA is made? In one such 
instance the resultant clustalw process ended up attempting to build a multiple 
alignment across the entire UniProt, which the server didn't handle well :-)

An interesting problem. List files have a long history, going back before EMBOSS. They were also used in the GCG (Wisconsin) package, which in turn adopted them from the VMS operating system. where they could be used for mailing lists (sending to @list with a list of usernames, for example).

In a list file, only the first token (word) is significant. The remainder of the line is treated as a comment.

As you discovered, a space before the id (or indeed just a database name) is a valid input representing all entries in the database.

I think it is safe to assume that list files in practice have no comments, so we can make a simple change for the next release:

list:: indicates a list file with only one token per line. Any extraneous text will result in an error or warning message

The same restriction will be applied to the VMS syntax @listfile

A new list style can be added to allow comments so that any user with them can still use their list files.

Possibly a stricter comment style could be allowed in standard list:: files. We can check what other packages may have introduced, but something like a perl-style #comment could be simple to add. The # character has no special meaning in the EMBOSS query language.

With those changes in place your users would be saved from extra spaces ... but of course would still be caught by a newline creeping in to start a new record after the database name (reading the entire database, then reading the id as a possible filename). Users will get an error message from that so long as the second part is not a valid filename or database name.

regards,

Peter Rice
EMBOSS Team


_______________________________________________
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss

Reply via email to