Melanie! Thanks for your thoughts. You are right about the mess mapping to different ontologies and vocabs produces. We have been working on trying to integrate explicitly with biopax (http://www.biopax.org/ states and generics proposal - level 2 was too limiting) in the hope that other databases (like GO, reactome, KEGG, DIP, signalling gateway, etc) get dragged along too. Some seem to be.
At the moment, I think the cleanup begins at the repository interface. So long as we validate that the biological annotation is unambiguous, complete, and represented in a way that agrees with one or more of an accepted set of ontologies and vocabs, then I think we might be in a position to perform useful queries. We also have a person doing a PhD here now on the visualisation of CellML models which relies solely on the biological annotation. It's a pretty good failure point to get nothing in your picture :-) thanks again cheers Matt On 3/29/07, Melanie Nelson <[EMAIL PROTECTED]> wrote: > Wow, I haven't posted to this list in a long time... > But I feel compelled to give a little advice as > someone who's spent a lot of time integrating > biological information and therefore has made a lot of > mistakes! > > By all means, have a best practice encouraging people > to use the GO cellular_component ontology to describe > organelles and cells. You could probably also use the > molecular_function ontology for proteins (although > this will be messier). However, neither is likely to > be a complete, i.e., there will be models that > reference a biological entity not in the GO > ontologies. Also, there will be cases where the entity > the model references is most properly thought of as > related in some way (e.g., a subset, a superset, or a > "sibling") to the GO entity. You can spend ages > sorting this sort of thing out and coming up with > consistent rules for handling all the relationships. > > > Since you aren't really interested in sorting out this > biological mess, you may want to consider letting > people choose their own ontology and just reference > it. > An example of this practice is in the MIAME project: > http://www.mged.org/Workgroups/MIAME/miame_1.1.html > > About the citations- my memory of this is fuzzy, but I > think the original intent was that people should > provide the PubMed ID where possible. However, not all > journals are indexed in PubMed (for instance, there is > a CellML paper published in one that is not), so the > model needs to handle full citation info, too. The BQS > model handles both, and then some, which is why we > chose it. > > Hope this is helpful, > Melanie > > > --- Andrew Miller <[EMAIL PROTECTED]> wrote: > > > Matt wrote: > > > I don't think this is a good idea. > > > > > > - I think bioentity should be depreciated, it has > > not intrinsic semantic value. > > > > > It does, unfortunately, seem to usually target a > > literal node at the > > moment. It would be nice for this to at least be a > > resource, which could > > provide further information about the biological > > entity (or if we decide > > not to do that, at least a resource, with a > > dictionary and a process for > > adding new words to the dictionary to avoid > > duplication). > > > > It seems that GO(Gene Ontology) has terms for cell > > types, biological > > compartments, and so on, which would offer a better > > way to provide this > > information. > > > > I still think that this metadata is useful, even if > > the automated > > interpretation of it is currently difficult. > > > - If it is used currently, it should be left as > > its current minimum > > > specification which is to label and point to other > > bioinformatics > > > database IDs. > > > > > There are three layers of information here: > > Layer 1: What biological entity are we describing? > > (could be answered > > with a GO term). > > Layer 2: What information about that biological > > entity are we using? > > (could be answered with a reference to a paper, and > > perhaps even a > > reference to raw experimental data). > > Layer 3: How was that information translated into a > > model (could be > > answered with a reference to a paper on the model). > > > > Layer 3 is clearly information about the model, and > > should be described > > by as an arc of the model resource. > > Layer 1 is described by a literal at the moment. > > > > Layer 2 is therefore a gap, which we don't have any > > proper way to > > represent now. > > > - The problem is not 'biologically related > > paper's' per se, but one of > > > identifying what was the primary publication or > > publications that > > > motivated a model. > > > > > The publication which motivated the expression of a > > model in CellML, or > > the publication which motivated the creation of the > > model? Most of the > > models in the repository were motivated by a paper > > about a model which > > was not initially expressed in CellML. However, the > > way that the > > metadata specification works now is that the paper > > which describes the > > model (not the paper which motivated it) is > > referenced from the > > information about the model (not information about > > the CellML file). > > > - There is also the case where a single > > publication that contains a > > > mathematical model is the one and only primary > > source for the model > > > itself - a rather common case at the moment. > > > > > This is what most models in CellML should aim to > > attain. Models can be > > submitted prior to publication as a model, but the > > step of going from > > the biology to a model is something which does need > > peer review. > > > I would prefer that the primary publication(s) be > > identified as such, > > > which covers the case in where there are some > > models in the repository > > > built from general review papers of biology with > > no math. > > > > > If a model is built in that way, it should reference > > the review papers > > as information about the biology, and the author > > should ideally submit > > it for publication, at which point the reference to > > the paper could be > > filled in. > > > I would prefer references to other related > > publications to be bound > > > explicitly to a comment in the model metadata - > > there should be a > > > reason identified by the author/editor/reviewer as > > to why there has > > > been such an association made. > > > > > The problem with this is that the comment is not > > machine readable, so > > there is then no way to get aggregate statistics on > > why models are > > linked. There is also a potential for significant > > duplication of > > information, as opposed to a set of standardised > > predicate terms for > > linking to a set of models. > > > As an aside, we also need to determine whether the > > bqs schema provides > > > enough detail to match publications across > > metadata instances for > > > different models, and whether we should be > > complimenting bibliographic > > > data with pubmed Ids and the like. > > > > > I think that the PUBMED ID is always useful, because > > it allows CellML > > processing software (e.g. the repository) to link > > directly to the Entrez > > / PUBMED page. We could build links based on > > searches for authors and > > titles, but a unique ID is much cleaner. It seems > > that many repository > > models do have PUBMED IDs on them. > > > > Best regards, > > Andrew > > > > _______________________________________________ > > cellml-discussion mailing list > > [email protected] > > > http://www.cellml.org/mailman/listinfo/cellml-discussion > > > > > > > ____________________________________________________________________________________ > Bored stiff? Loosen up... > Download and play hundreds of games for free on Yahoo! Games. > http://games.yahoo.com/games/front > _______________________________________________ > cellml-discussion mailing list > [email protected] > http://www.cellml.org/mailman/listinfo/cellml-discussion > _______________________________________________ cellml-discussion mailing list [email protected] http://www.cellml.org/mailman/listinfo/cellml-discussion
