On Fri, Mar 15, 2002 at 01:51:07PM -0500, Marc Colosimo wrote: > Thomas Down <[EMAIL PROTECTED]> wrote: > > > Hi... > > > > I'm considering adding a filter(FeatureFilter); method to > > SequenceDB, which allows features to be extracted from a > > whole database, rather than just a single sequence. Typical > > usage would be: > > > > SequenceDB seqDB = ... > > FeatureHolder mygene = seqDB.filter( > > new FeatureFilter.ByAnnotation("gene.id", "BRCA2") > > ); > > Would this return the feature as some sort of generic gene.id feature? My > growing concern is that for each file/db/SQL format we are adding features with > their original names rather than some defined BioJava enforced named feature. I > noticed a dtd for features. Unfortunately, I don't know much about XML besides > the simple things. Could we make something like gene_id, accession_no, etc... > ? By using these set names, you don't have to know what a gene_id tag is for > EMBL, genbank, SQL,....... > > Or have I missed this ability in BioJava somehow?
No, your concern is quite justified. It is, indeed, necessary to have some specialized knowledge about a particular data source before you can really make use of the tag-value data present in the Annotation bundles. I think a set of `common' key names would be a big help, and I'd welcome any proposals for what should be in here (the standard set of feature types and qualifiers from EMBL might be a good starting point, but probably not a complete solution). I'd also like to be able to introspect, for a given database, what properties I should expect to find on features. The AnnotationType objects, written recently by Matthew, ought to be one part of the puzzle. Even before this problem is solved, the filter-all-features-in-a-database operator still seems to me to be useful -- and I can't see any way in which it should make improved standardization and `introspectability' harder in the future. Or am I missing something? Thomas. _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l