Re: [cellml-discussion] Concerning the CellML Model Repository

Andrew Miller Thu, 21 Jun 2007 15:50:29 -0700

Tommy Yu wrote:
> Hi,
>
> I have written down some of my thoughts on how the model repository could be 
> put together.
>
> http://www.cellml.org/Members/tommy/repository_redesign.html
>
> It is still a pretty rough document.  The usage example section gives a rough 
> outline on what I see people might be doing with the repository and how this 
> design could address those issues, which I think it will be of interest to 
> users.  It is not an exhaustive list, yet.
>
> I must also note the design outlined is quite a drastic departure from what 
> we have now (it will be yet another new repository).  However, it is more 
> true to the one envisioned before according to 
> http://www.cellml.org/wiki/CellMLModelRepositories, except I have an addition 
> layer that will assist in pulling content and drawing relationships between 
> models.
>
> Feel free to take it apart and/or build on top of it.
>   
Hi Tommy,


A few comments:
1) I am still not convinced that meta-data should not be versioned, 
simply because changes to metadata can be important changes to a model. 
In some cases, such as changes to simulation metadata, the changes might 
have a major impact on the final model.

I don't think it is a bad thing to have a one-way cache of metadata 
somewhere for technical / performance reasons (perhaps in a relational 
database), but I think that we should replicate data for each model 
(perhaps using a deep copy-on-write approach if this is really necessary 
to save disk space) rather than changing the metadata for existing 
models without changing the version.

Making changes to metadata require changes to the model will ensure that 
no one gets burned by referencing a particular version of a model, only 
to find that the metadata in that version has changed on them.

Your current unversioned, globally shared metadata approach probably 
also has security implications. For example, lets say that Alice submits 
a model which references a publication. Now suppose that Charlie wasn't 
an author of that paper, but he wants to add his name onto the list of 
authors. So he submits a completely different, bogus, model which 
includes metadata for the publication, and includes his name. When Bob 
downloads Alice's model from the repository, it would then include 
Charlie's name as one of the authors (assuming that the publication was 
referenced by PubMed ID or DOI or some sort of publication URI. 
Particular cases like the one I described might be able to be secured in 
an ad hoc fashion such as by checking that the authors are the same, but 
the general attack will still pervade this type of approach unless 
metadata is associated uniquely with a particular version of a 
particular model. If the assertions about the same subject cannot be 
identified between models in the database, then having data flow back 
from the relational database into the model does not carry any benefit 
at all).

However, I do agree that there is a place for some metadata which can be 
changed without creating a new version (which probably is the type of 
metadata that you wouldn't include in the CellML file by default). 
Curation status and permissions would probably fit in this category, 
because although they may be associated with a particular version, they 
should not be immutable for a given version.

2) I think that there should be a directory for each mathematical model 
(which may include several CellML model files, documentation, and so 
on), so that a particular version can be downloaded / checked out in its 
entirety (with some directory-level manifest describing how to run or 
view the model). This suggests that collisions between mathematical 
models should be prevented at this level, not at the file level. Under 
this scheme, Mary would find that at usage example 3, she couldn't use 
the same directory name as the one John already submitted.

3) I think the 'reference by citation' needs some expansion: I think 
people referencing models should have the choice to refer to:
 => a specific version for which no files will change at all.
 => the latest version which aims to reflect the letter of a publication 
(updates will only fix mistakes in the model which prevent it from 
corresponding to the printed paper).
 => the latest version which aims to reflect the results obtained by the 
author (updates can fix discrepancies or omissions from the paper that 
were in the author's original code, if the author didn't use CellML).
 => the latest derivative of the current model developed by the same 
author / group, even if it has not yet been peer-reviewed (subject to 
permissions constraints).
 => the latest derivative of the current model, but with all imports 
external to the model updated to the latest versions (even if this has 
not been reviewed by the author). This would be the most frequently 
updated version, because it could be automatically created without the 
model author being involved.

It would also be possible to search for derivatives made by other authors.

4) I'm not sure that the keywords based URIs are strictly necessary. 
Perhaps search functionality which links to models is enough for this 
(which avoids a whole set URI stability issues)?

Best regards,
Andrew

> Cheers,
> Tommy.
> _______________________________________________
> cellml-discussion mailing list
> cellml-discussion@cellml.org
> http://www.cellml.org/mailman/listinfo/cellml-discussion
>   

_______________________________________________
cellml-discussion mailing list
cellml-discussion@cellml.org
http://www.cellml.org/mailman/listinfo/cellml-discussion

Re: [cellml-discussion] Concerning the CellML Model Repository

Reply via email to