Melanie!

Thanks for your thoughts. You are right about the mess mapping to
different ontologies and vocabs produces. We have been working on
trying to integrate explicitly with biopax (http://www.biopax.org/
states and generics proposal - level 2 was too limiting) in the hope
that other databases (like GO, reactome, KEGG, DIP, signalling
gateway, etc) get dragged along too. Some seem to be.

At the moment, I think the cleanup begins at the repository interface.
So long as we validate that the biological annotation is unambiguous,
complete, and represented in a way that agrees with one or more of an
accepted set of ontologies and vocabs, then I think we might be in a
position to perform useful queries.

We also have a person doing a PhD here now on the visualisation of
CellML models which relies solely on the biological annotation.  It's
a pretty good failure point to get nothing in your picture :-)

thanks again
cheers
Matt

On 3/29/07, Melanie Nelson <[EMAIL PROTECTED]> wrote:
> Wow, I haven't posted to this list in a long time...
> But I feel compelled to give a little advice as
> someone who's spent a lot of time integrating
> biological information and therefore has made a lot of
> mistakes!
>
> By all means, have a best practice encouraging people
> to use the GO cellular_component ontology to describe
> organelles and cells. You could probably also use the
> molecular_function ontology for proteins (although
> this will be messier). However, neither is likely to
> be a complete, i.e., there will be models that
> reference a biological entity not in the GO
> ontologies. Also, there will be cases where the entity
> the model references is most properly thought of as
> related in some way (e.g., a subset, a superset, or a
> "sibling") to the GO entity. You can spend ages
> sorting this sort of thing out and coming up with
> consistent rules for handling all the relationships.
>
>
> Since you aren't really interested in sorting out this
> biological mess, you may want to consider letting
> people choose their own ontology and just reference
> it.
> An example of this practice is in the MIAME project:
> http://www.mged.org/Workgroups/MIAME/miame_1.1.html
>
> About the citations- my memory of this is fuzzy, but I
> think the original intent was that people should
> provide the PubMed ID where possible. However, not all
> journals are indexed in PubMed (for instance, there is
> a CellML paper published in one that is not), so the
> model needs to handle full citation info, too. The BQS
> model handles both, and then some, which is why we
> chose it.
>
> Hope this is helpful,
> Melanie
>
>
> --- Andrew Miller <[EMAIL PROTECTED]> wrote:
>
> > Matt wrote:
> > > I don't think this is a good idea.
> > >
> > > - I think bioentity should be depreciated, it has
> > not intrinsic semantic value.
> > >
> > It does, unfortunately, seem to usually target a
> > literal node at the
> > moment. It would be nice for this to at least be a
> > resource, which could
> > provide further information about the biological
> > entity (or if we decide
> > not to do that, at least a resource, with a
> > dictionary and a process for
> > adding new words to the dictionary to avoid
> > duplication).
> >
> > It seems that GO(Gene Ontology) has terms for cell
> > types, biological
> > compartments, and so on, which would offer a better
> > way to provide this
> > information.
> >
> > I still think that this metadata is useful, even if
> > the automated
> > interpretation of it is currently difficult.
> > > - If it is used currently, it should be left as
> > its current minimum
> > > specification which is to label and point to other
> > bioinformatics
> > > database IDs.
> > >
> > There are three layers of information here:
> > Layer 1: What biological entity are we describing?
> > (could be answered
> > with a GO term).
> > Layer 2: What information about that biological
> > entity are we using?
> > (could be answered with a reference to a paper, and
> > perhaps even a
> > reference to raw experimental data).
> > Layer 3: How was that information translated into a
> > model (could be
> > answered with a reference to a paper on the model).
> >
> > Layer 3 is clearly information about the model, and
> > should be described
> > by as an arc of the model resource.
> > Layer 1 is described by a literal at the moment.
> >
> > Layer 2 is therefore a gap, which we don't have any
> > proper way to
> > represent now.
> > > - The problem is not 'biologically related
> > paper's' per se, but one of
> > > identifying what was the primary publication or
> > publications that
> > > motivated a model.
> > >
> > The publication which motivated the expression of a
> > model in CellML, or
> > the publication which motivated the creation of the
> > model? Most of the
> > models in the repository were motivated by a paper
> > about a model which
> > was not initially expressed in CellML. However, the
> > way that the
> > metadata specification works now is that the paper
> > which describes the
> > model (not the paper which motivated it) is
> > referenced from the
> > information about the model (not information about
> > the CellML file).
> > > - There is also the case where a single
> > publication that contains a
> > > mathematical model is the one and only primary
> > source for the model
> > > itself - a rather common case at the moment.
> > >
> > This is what most models in CellML should aim to
> > attain. Models can be
> > submitted prior to publication as a model, but the
> > step of going from
> > the biology to a model is something which does need
> > peer review.
> > > I would prefer that the primary publication(s) be
> > identified as such,
> > > which covers the case in where there are some
> > models in the repository
> > > built from general review papers of biology with
> > no math.
> > >
> > If a model is built in that way, it should reference
> > the review papers
> > as information about the biology, and the author
> > should ideally submit
> > it for publication, at which point the reference to
> > the paper could be
> > filled in.
> > > I would prefer references to other related
> > publications to be bound
> > > explicitly to a comment in the model metadata -
> > there should be a
> > > reason identified by the author/editor/reviewer as
> > to why there has
> > > been such an association made.
> > >
> > The problem with this is that the comment is not
> > machine readable, so
> > there is then no way to get aggregate statistics on
> > why models are
> > linked. There is also a potential for significant
> > duplication of
> > information, as opposed to a set of standardised
> > predicate terms for
> > linking to a set of models.
> > > As an aside, we also need to determine whether the
> > bqs schema provides
> > > enough detail to match publications across
> > metadata instances for
> > > different models, and whether we should be
> > complimenting bibliographic
> > > data with pubmed Ids and the like.
> > >
> > I think that the PUBMED ID is always useful, because
> > it allows CellML
> > processing software (e.g. the repository) to link
> > directly to the Entrez
> > / PUBMED page. We could build links based on
> > searches for authors and
> > titles, but a unique ID is much cleaner. It seems
> > that many repository
> > models do have PUBMED IDs on them.
> >
> > Best regards,
> > Andrew
> >
> > _______________________________________________
> > cellml-discussion mailing list
> > [email protected]
> >
> http://www.cellml.org/mailman/listinfo/cellml-discussion
> >
>
>
>
>
> ____________________________________________________________________________________
> Bored stiff? Loosen up...
> Download and play hundreds of games for free on Yahoo! Games.
> http://games.yahoo.com/games/front
> _______________________________________________
> cellml-discussion mailing list
> [email protected]
> http://www.cellml.org/mailman/listinfo/cellml-discussion
>
_______________________________________________
cellml-discussion mailing list
[email protected]
http://www.cellml.org/mailman/listinfo/cellml-discussion

Reply via email to