Michael Thon wrote:
I am setting up a database from Genbank formatted files. I understand how to index the db and configure the emboss.default file but I don't know how to construct the queries. queries for sequence IDs are pretty simple, i.e. with a USA of the format "dbname:id". But, how to I create a query for the other fields, such as org and key? Also, do these fields support wildcards or substring matches or other fancy stuff?
Assuming you indexed all the fields (by default ID and ACC are indexed) you use the same syntax as in srs (we saw no need to invent a new syntax, so we used the same field name abbreviations but we did drop the '[]' around the query :-) dbname-acc:x13776 dbname-org:pseudomonas* dbname-des:amidase dbname-key: dbname-sv: dbname-gi: and, to complete the set, dbname-id:x13776 As you see, wildcards are allowed with '*' at the end. We can make this much more sophisticated, allowing more wildcard options and combining queries. So far EMBOSS users have been content to use SRS or alternatives (MRS for example). If there is interest, we can extend the USA to include wildcards, AND/OR/NOT, search multiple fields, combine databases, and if we get really ambitious we could include links between databases. We will have to be careful to restrict some of these extensions to database access methods that support them. Hope this helps, Peter _______________________________________________ EMBOSS mailing list [email protected] http://lists.open-bio.org/mailman/listinfo/emboss
