Re: [cellml-discussion] Biological and other non-model citations in CellML metadata?

Matt Thu, 29 Mar 2007 12:05:27 -0800

On 3/29/07, Nicolas Le Novere <[EMAIL PROTECTED]> wrote:
> On Thu, 29 Mar 2007, Matt  wrote:
>
> > Can you explain in more detail or point to explanations of
> > bqmodel:isDescribedBy?
>
> You can find some explanations at:
>
> http://www.ebi.ac.uk/compneur-srv/miriam-main/mdb?section=qualifiers



So  there is no simple way to determine if this is a reference to a
journal article except through interpreting the URI?


>
> Note tha qualifiers are optional to be MIRIAM-compliant. I personaly
> think we should always use some qualification, otherwise an annotation
> becomes very difficult to use except for jumping from webpage to
> webpage.
>
> > Specifically:
> > - what is its intended meaning?
>
> Cf above. Note that the list of qualifiers is by no mean frozen. We
> are already aware of several gaps (e.g. how do-we qualify the relation
> between a peptide and the gene that encodes it?)
>
> > - when more than one of these is defined on a resource, how is this
> > interpreted? For example: is there some precedence implied somehow?
>
> This is up to the "tool" using the qualifiers. SBML does not allow
> nested qualifications. There is only an implicit "hasVersion" if several
> identical qualifiers are present:
>
> bqmodel:isDescribedBy toto
> bqmodel:isDescribedBy tata
>
> means is described by toto and is described by tata. In other words
> toto or tata describe the component.
>
> NOT toto and tata are necessary to describe the component.
>
> On top of that, BioModels DB add some precedence
> http://www.ebi.ac.uk/compneur-srv/biomodels/doc/annotation.html
>
> But all that is not part of MIRIAM rules.
>
> > - how do you determine the kind of reference it is - for example a
> > pubmed uri? You have a datatype for vocab/database IDs in the
> > annotation scheme you described, but I don't see this in the
> > bqmodel:isDescribedBy examples.
>
> <rdf:li rdf:resource="http://www.pubmed.gov/#8983160"/>
>
> http://www.pubmed.gov/   means "the following identifier has to be 
> interpreted as pointing to a data of PubMed".
>
> http://www.pubmed.gov/ is unique and should not normally
> change. However, sometimes it may neverstheless change for various
> reasons: URI too confusing, badly choose, fusion of two resources
> etc. For instance, the old PubMed URI was
> http://www.ncbi.nlm.nih.gov/PubMed/
> It was misleading because tied to a particular physical resource at
> the NCBI.
>
> We have a deprecation system in place that allow to resolve the
> old URIs and provide the new ones.
>
>
> > - how would you address auxiliary references as opposed to primary
> > references so that a machine interpreting it can make the distinction?
>
> I am not sure I understand that. Like primary and secondary accessions of 
> UniProt?

For journal articles, or other publications, then being able to
identify the primary reference(s) is useful. For database records, it
would also be useful to label a group as being the most important (or
defining) set, and others as 'helpful'. It was why I suggested that
CellML bibliographic referencing seperated these two, and that the
latter would need to be bound to a reason (a natural language comment
would be fine) the described why that reference was made.

>
> >
> > <snip>
> >>
> >> I entirely agree with Melanie, people should be able to pick the
> >> resource they want, as far as they uniquely identify it. This is
> >> clearly described in the MIRIAM paper.
> >
> > I'm not sure what benefits one gains from letting people arbitrarily
> > choose what they want to use to identify something with. For example,
> > how to you work out if particular entities in one SBML model match
> > entities in another SBML model?
> >
> > Also, given that most of these resources are controlled vocabularies,
> > there is a lot of room for misunderstanding someone's intention when
> > using their choices of identifiers.
> >
> >
> >
> >> An annotation is formed of
> >> three parts:
> >>
> >> The data-type, e.g. PubMed entry, DOI, GO term, Cell-type ontology term ...
> >>
> >> The identifier of the particular information, e.g. 123456789, GO:0001234 
> >> ...
> >>
> >> An optional qualifier that describe the relationship between the concept 
> >> represented by the model component and the concept represented by the 
> >> particular information.
> >>
> >> To help people implement that, we developed MIRIAM resources
> >> (http://www.ebi.ac.uk/compneur-srv/miriam/).
> >>
> >> If you download a model from BioModels DB in SBML (not in CellML at
> >> the moment, for obvious reasons highlighted by the current
> >> discussion), you will see something like:
> >>
> >> <bqmodel:isDescribedBy>
> >> <rdf:Bag>
> >> <rdf:li rdf:resource="http://www.pubmed.gov/#8983160"/>
> >> </rdf:Bag>
> >> </bqmodel:isDescribedBy>
> >>
> >> But on the webpage, there is:
> >>
> >> b>Publication ID:</b>&nbsp;<a 
> >> href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=8983160";
> >>  target="_blank">8983160</a>
> >>
> >> The URL is dynamically generated by MIRIAM webservices. I fact in the
> >> new version of BioModels DB, to be released in the fall, the URL does
> >> not point to PubMed anymore, but to the EBI extended Medline, more
> >> comprehensive. BUT the URI stored in the model is still the SAME.
> >>
> >> Similarly for a DOI:
> >>
> >> <bqmodel:isDescribedBy>
> >> <rdf:Bag>
> >> <rdf:li rdf:resource="http://www.doi.org/#10.1063/1.1681288"/>
> >> </rdf:Bag>
> >> </bqmodel:isDescribedBy>
> >>
> >> is transformed in:
> >>
> >> b>Publication ID:</b>&nbsp;<a href="http://dx.doi.org/10.1063/1.1681288"; 
> >> target="_blank">10.1063/1.1681288...</a>
> >>
> >> That system is very flexible. You can use any resource listed in
> >> MIRIAM resources, and this resource can be extended at will (note that
> >> we distribute XML version of the resource for local use). But it is
> >> still robust and expressive.
> >>
> >> Cheers,
> >>
> >> On Wed, 28 Mar 2007, Melanie Nelson wrote:
> >>
> >>> Wow, I haven't posted to this list in a long time...
> >>> But I feel compelled to give a little advice as
> >>> someone who's spent a lot of time integrating
> >>> biological information and therefore has made a lot of
> >>> mistakes!
> >>>
> >>> By all means, have a best practice encouraging people
> >>> to use the GO cellular_component ontology to describe
> >>> organelles and cells. You could probably also use the
> >>> molecular_function ontology for proteins (although
> >>> this will be messier). However, neither is likely to
> >>> be a complete, i.e., there will be models that
> >>> reference a biological entity not in the GO
> >>> ontologies. Also, there will be cases where the entity
> >>> the model references is most properly thought of as
> >>> related in some way (e.g., a subset, a superset, or a
> >>> "sibling") to the GO entity. You can spend ages
> >>> sorting this sort of thing out and coming up with
> >>> consistent rules for handling all the relationships.
> >>>
> >>>
> >>> Since you aren't really interested in sorting out this
> >>> biological mess, you may want to consider letting
> >>> people choose their own ontology and just reference
> >>> it.
> >>> An example of this practice is in the MIAME project:
> >>> http://www.mged.org/Workgroups/MIAME/miame_1.1.html
> >>>
> >>> About the citations- my memory of this is fuzzy, but I
> >>> think the original intent was that people should
> >>> provide the PubMed ID where possible. However, not all
> >>> journals are indexed in PubMed (for instance, there is
> >>> a CellML paper published in one that is not), so the
> >>> model needs to handle full citation info, too. The BQS
> >>> model handles both, and then some, which is why we
> >>> chose it.
> >>>
> >>> Hope this is helpful,
> >>> Melanie
> >>>
> >>>
> >>> --- Andrew Miller <[EMAIL PROTECTED]> wrote:
> >>>
> >>>> Matt wrote:
> >>>>> I don't think this is a good idea.
> >>>>>
> >>>>> - I think bioentity should be depreciated, it has
> >>>> not intrinsic semantic value.
> >>>>>
> >>>> It does, unfortunately, seem to usually target a
> >>>> literal node at the
> >>>> moment. It would be nice for this to at least be a
> >>>> resource, which could
> >>>> provide further information about the biological
> >>>> entity (or if we decide
> >>>> not to do that, at least a resource, with a
> >>>> dictionary and a process for
> >>>> adding new words to the dictionary to avoid
> >>>> duplication).
> >>>>
> >>>> It seems that GO(Gene Ontology) has terms for cell
> >>>> types, biological
> >>>> compartments, and so on, which would offer a better
> >>>> way to provide this
> >>>> information.
> >>>>
> >>>> I still think that this metadata is useful, even if
> >>>> the automated
> >>>> interpretation of it is currently difficult.
> >>>>> - If it is used currently, it should be left as
> >>>> its current minimum
> >>>>> specification which is to label and point to other
> >>>> bioinformatics
> >>>>> database IDs.
> >>>>>
> >>>> There are three layers of information here:
> >>>> Layer 1: What biological entity are we describing?
> >>>> (could be answered
> >>>> with a GO term).
> >>>> Layer 2: What information about that biological
> >>>> entity are we using?
> >>>> (could be answered with a reference to a paper, and
> >>>> perhaps even a
> >>>> reference to raw experimental data).
> >>>> Layer 3: How was that information translated into a
> >>>> model (could be
> >>>> answered with a reference to a paper on the model).
> >>>>
> >>>> Layer 3 is clearly information about the model, and
> >>>> should be described
> >>>> by as an arc of the model resource.
> >>>> Layer 1 is described by a literal at the moment.
> >>>>
> >>>> Layer 2 is therefore a gap, which we don't have any
> >>>> proper way to
> >>>> represent now.
> >>>>> - The problem is not 'biologically related
> >>>> paper's' per se, but one of
> >>>>> identifying what was the primary publication or
> >>>> publications that
> >>>>> motivated a model.
> >>>>>
> >>>> The publication which motivated the expression of a
> >>>> model in CellML, or
> >>>> the publication which motivated the creation of the
> >>>> model? Most of the
> >>>> models in the repository were motivated by a paper
> >>>> about a model which
> >>>> was not initially expressed in CellML. However, the
> >>>> way that the
> >>>> metadata specification works now is that the paper
> >>>> which describes the
> >>>> model (not the paper which motivated it) is
> >>>> referenced from the
> >>>> information about the model (not information about
> >>>> the CellML file).
> >>>>> - There is also the case where a single
> >>>> publication that contains a
> >>>>> mathematical model is the one and only primary
> >>>> source for the model
> >>>>> itself - a rather common case at the moment.
> >>>>>
> >>>> This is what most models in CellML should aim to
> >>>> attain. Models can be
> >>>> submitted prior to publication as a model, but the
> >>>> step of going from
> >>>> the biology to a model is something which does need
> >>>> peer review.
> >>>>> I would prefer that the primary publication(s) be
> >>>> identified as such,
> >>>>> which covers the case in where there are some
> >>>> models in the repository
> >>>>> built from general review papers of biology with
> >>>> no math.
> >>>>>
> >>>> If a model is built in that way, it should reference
> >>>> the review papers
> >>>> as information about the biology, and the author
> >>>> should ideally submit
> >>>> it for publication, at which point the reference to
> >>>> the paper could be
> >>>> filled in.
> >>>>> I would prefer references to other related
> >>>> publications to be bound
> >>>>> explicitly to a comment in the model metadata -
> >>>> there should be a
> >>>>> reason identified by the author/editor/reviewer as
> >>>> to why there has
> >>>>> been such an association made.
> >>>>>
> >>>> The problem with this is that the comment is not
> >>>> machine readable, so
> >>>> there is then no way to get aggregate statistics on
> >>>> why models are
> >>>> linked. There is also a potential for significant
> >>>> duplication of
> >>>> information, as opposed to a set of standardised
> >>>> predicate terms for
> >>>> linking to a set of models.
> >>>>> As an aside, we also need to determine whether the
> >>>> bqs schema provides
> >>>>> enough detail to match publications across
> >>>> metadata instances for
> >>>>> different models, and whether we should be
> >>>> complimenting bibliographic
> >>>>> data with pubmed Ids and the like.
> >>>>>
> >>>> I think that the PUBMED ID is always useful, because
> >>>> it allows CellML
> >>>> processing software (e.g. the repository) to link
> >>>> directly to the Entrez
> >>>> / PUBMED page. We could build links based on
> >>>> searches for authors and
> >>>> titles, but a unique ID is much cleaner. It seems
> >>>> that many repository
> >>>> models do have PUBMED IDs on them.
> >>>>
> >>>> Best regards,
> >>>> Andrew
> >>>>
> >>>> _______________________________________________
> >>>> cellml-discussion mailing list
> >>>> [email protected]
> >>>>
> >>> http://www.cellml.org/mailman/listinfo/cellml-discussion
> >>>>
> >>>
> >>>
> >>>
> >>>
> >>> ____________________________________________________________________________________
> >>> Bored stiff? Loosen up...
> >>> Download and play hundreds of games for free on Yahoo! Games.
> >>> http://games.yahoo.com/games/front
> >>> _______________________________________________
> >>> cellml-discussion mailing list
> >>> [email protected]
> >>> http://www.cellml.org/mailman/listinfo/cellml-discussion
> >>>
> >>
> >> --
> >> Nicolas LE NOVERE,  Computational Neurobiology,
> >> EMBL-EBI, Wellcome-Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
> >> Tel: +44(0)1223494521, Fax: +44(0)1223494468, Mob: +44(0)7833147074
> >> http://www.ebi.ac.uk/~lenov, AIM: nlenovere, MSN: [EMAIL PROTECTED]
> >> _______________________________________________
> >> cellml-discussion mailing list
> >> [email protected]
> >> http://www.cellml.org/mailman/listinfo/cellml-discussion
> >>
> > _______________________________________________
> > cellml-discussion mailing list
> > [email protected]
> > http://www.cellml.org/mailman/listinfo/cellml-discussion
> >
>
> --
> Nicolas LE NOVERE,  Computational Neurobiology,
> EMBL-EBI, Wellcome-Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
> Tel: +44(0)1223494521, Fax: +44(0)1223494468, Mob: +44(0)7833147074
> http://www.ebi.ac.uk/~lenov, AIM: nlenovere, MSN: [EMAIL PROTECTED]
> _______________________________________________
> cellml-discussion mailing list
> [email protected]
> http://www.cellml.org/mailman/listinfo/cellml-discussion
>
_______________________________________________
cellml-discussion mailing list
[email protected]
http://www.cellml.org/mailman/listinfo/cellml-discussion

Re: [cellml-discussion] Biological and other non-model citations in CellML metadata?

Reply via email to