Tommy Yu wrote: > Hi, > > I have written down some of my thoughts on how the model repository could be > put together. > > http://www.cellml.org/Members/tommy/repository_redesign.html > > It is still a pretty rough document. The usage example section gives a rough > outline on what I see people might be doing with the repository and how this > design could address those issues, which I think it will be of interest to > users. It is not an exhaustive list, yet. > > I must also note the design outlined is quite a drastic departure from what > we have now (it will be yet another new repository). However, it is more > true to the one envisioned before according to > http://www.cellml.org/wiki/CellMLModelRepositories, except I have an addition > layer that will assist in pulling content and drawing relationships between > models. > > Feel free to take it apart and/or build on top of it. > Hi Tommy,
A few comments: 1) I am still not convinced that meta-data should not be versioned, simply because changes to metadata can be important changes to a model. In some cases, such as changes to simulation metadata, the changes might have a major impact on the final model. I don't think it is a bad thing to have a one-way cache of metadata somewhere for technical / performance reasons (perhaps in a relational database), but I think that we should replicate data for each model (perhaps using a deep copy-on-write approach if this is really necessary to save disk space) rather than changing the metadata for existing models without changing the version. Making changes to metadata require changes to the model will ensure that no one gets burned by referencing a particular version of a model, only to find that the metadata in that version has changed on them. Your current unversioned, globally shared metadata approach probably also has security implications. For example, lets say that Alice submits a model which references a publication. Now suppose that Charlie wasn't an author of that paper, but he wants to add his name onto the list of authors. So he submits a completely different, bogus, model which includes metadata for the publication, and includes his name. When Bob downloads Alice's model from the repository, it would then include Charlie's name as one of the authors (assuming that the publication was referenced by PubMed ID or DOI or some sort of publication URI. Particular cases like the one I described might be able to be secured in an ad hoc fashion such as by checking that the authors are the same, but the general attack will still pervade this type of approach unless metadata is associated uniquely with a particular version of a particular model. If the assertions about the same subject cannot be identified between models in the database, then having data flow back from the relational database into the model does not carry any benefit at all). However, I do agree that there is a place for some metadata which can be changed without creating a new version (which probably is the type of metadata that you wouldn't include in the CellML file by default). Curation status and permissions would probably fit in this category, because although they may be associated with a particular version, they should not be immutable for a given version. 2) I think that there should be a directory for each mathematical model (which may include several CellML model files, documentation, and so on), so that a particular version can be downloaded / checked out in its entirety (with some directory-level manifest describing how to run or view the model). This suggests that collisions between mathematical models should be prevented at this level, not at the file level. Under this scheme, Mary would find that at usage example 3, she couldn't use the same directory name as the one John already submitted. 3) I think the 'reference by citation' needs some expansion: I think people referencing models should have the choice to refer to: => a specific version for which no files will change at all. => the latest version which aims to reflect the letter of a publication (updates will only fix mistakes in the model which prevent it from corresponding to the printed paper). => the latest version which aims to reflect the results obtained by the author (updates can fix discrepancies or omissions from the paper that were in the author's original code, if the author didn't use CellML). => the latest derivative of the current model developed by the same author / group, even if it has not yet been peer-reviewed (subject to permissions constraints). => the latest derivative of the current model, but with all imports external to the model updated to the latest versions (even if this has not been reviewed by the author). This would be the most frequently updated version, because it could be automatically created without the model author being involved. It would also be possible to search for derivatives made by other authors. 4) I'm not sure that the keywords based URIs are strictly necessary. Perhaps search functionality which links to models is enough for this (which avoids a whole set URI stability issues)? Best regards, Andrew > Cheers, > Tommy. > _______________________________________________ > cellml-discussion mailing list > cellml-discussion@cellml.org > http://www.cellml.org/mailman/listinfo/cellml-discussion > _______________________________________________ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion