Thanks Peter, I got it working.
While I'm at it, a couple more questions popped up:
1) do you know if these indexes compatible with the Bio::DB::Registry type databases?
2) Is there any way to index and search sequence features?
Best
Mike


On Dec 12, 2007, at 12:21 PM, Peter Rice wrote:

Michael Thon wrote:
I am setting up a database from Genbank formatted files. I understand how to index the db and configure the emboss.default file but I don't know how to construct the queries. queries for sequence IDs are pretty simple, i.e. with a USA of the format "dbname:id". But, how to I create a query for the other fields, such as org and key? Also, do these fields support wildcards or substring matches or other fancy stuff?

Assuming you indexed all the fields (by default ID and ACC are indexed)
you use the same syntax as in srs (we saw no need to invent a new
syntax, so we used the same field name abbreviations but we did drop the
'[]' around the query :-)

dbname-acc:x13776
dbname-org:pseudomonas*
dbname-des:amidase
dbname-key:
dbname-sv:
dbname-gi:

and, to complete the set, dbname-id:x13776

As you see, wildcards are allowed with '*' at the end.

We can make this much more sophisticated, allowing more wildcard options and combining queries. So far EMBOSS users have been content to use SRS
or alternatives (MRS for example).

If there is interest, we can extend the USA to include wildcards,
AND/OR/NOT, search multiple fields, combine databases, and if we get
really ambitious we could include links between databases.

We will have to be careful to restrict some of these extensions to
database access methods that support them.

Hope this helps,

Peter

_______________________________________________
EMBOSS mailing list
[email protected]
http://lists.open-bio.org/mailman/listinfo/emboss

Reply via email to