Thanks Peter, I got it working.
While I'm at it, a couple more questions popped up:
1) do you know if these indexes compatible with the Bio::DB::Registry
type databases?
2) Is there any way to index and search sequence features?
Best
Mike
On Dec 12, 2007, at 12:21 PM, Peter Rice wrote:
Michael Thon wrote:
I am setting up a database from Genbank formatted files. I
understand how to index the db and configure the emboss.default
file but I don't know how to construct the queries. queries for
sequence IDs are pretty simple, i.e. with a USA of the format
"dbname:id". But, how to I create a query for the other fields,
such as org and key? Also, do these fields support wildcards or
substring matches or other fancy stuff?
Assuming you indexed all the fields (by default ID and ACC are
indexed)
you use the same syntax as in srs (we saw no need to invent a new
syntax, so we used the same field name abbreviations but we did drop
the
'[]' around the query :-)
dbname-acc:x13776
dbname-org:pseudomonas*
dbname-des:amidase
dbname-key:
dbname-sv:
dbname-gi:
and, to complete the set, dbname-id:x13776
As you see, wildcards are allowed with '*' at the end.
We can make this much more sophisticated, allowing more wildcard
options
and combining queries. So far EMBOSS users have been content to use
SRS
or alternatives (MRS for example).
If there is interest, we can extend the USA to include wildcards,
AND/OR/NOT, search multiple fields, combine databases, and if we get
really ambitious we could include links between databases.
We will have to be careful to restrict some of these extensions to
database access methods that support them.
Hope this helps,
Peter
_______________________________________________
EMBOSS mailing list
[email protected]
http://lists.open-bio.org/mailman/listinfo/emboss