Matt wrote: >> - Version/Variant >> It already clogged up the system. There is no proper revision control >> mechanism, what we have now is an ad-hoc emulated system. >> > > I don't think it has clogged the system I just think it has been > improperly used both by authors and by the user interface. This is no > fault of the authors, there is simply a specification for versioning > that is missing. The hope is that subversion applies well to this. > I think that the versioning system itself is the root of the problem, because it is simultaneously too complicated and too limited.
In particular: Branching is inherently a hierarchical process with arbitrary depth, in the sense that branches can be made from branches to an arbitrary depth. However, the variant / version system does not really provide the proper tools to deal with this, because it is limited to two levels (variant and version) before its utility in tracking what is a derivative of what is exhausted. It is also inadequate because a new model might combine parts of other models, especially if it is a 1.1 model, and these parts need to be tracked individually. I think that the solution is to simplify down to a single global version number that is common across the repository or the model (like in Subversion), and then let either the CellML metadata, or perhaps the Subversion copy history, describe the way a model has been derived. I see the following workflow as being both simpler and more general... John Doe creates a new model directory which has its primary URL at: http://www.cellml.org/models/id/0ff280ef-dce6-4a42-a275-c9a7d9699096/ John now owns this model and is the only one who can change it. John also gets to decide the visibility of different revisions of the model. John makes several revisions to the model (each of which bumps the global revision number). There is a URL by which each historic version can be referred to. John then publishes the model in a journal, referring to it by the primary URL (or perhaps a short-form if we want to offer authors the option of assigning one). After the paper is accepted by a peer-reviewed journal, John updates the metadata on the model. When he commits these changes, the repository sees this and creates a new alias, e.g. at: http://www.cellml.org/models/citation/doe_2007_1/ John makes some further changes to his model post-publication and commits them. However, by some mechanism (perhaps by the change metadata?) the repository knows that this is a change which occurred post-publication by John. Mary notices that there was a discrepancy between the model and John's published paper (assuming that he didn't reference the CellML model in the paper). She creates a new primary URL containing a copy of John's published model, at: http://www.cellml.org/models/id/281ab697-4607-4fcf-a433-f3ec382fb445/ She gets John to check this. When John agrees, she updates the metadata on her model to indicate that her version is a more correct version of John's paper. The repository then updates so that http://www.cellml.org/models/citation/doe_2007_1/ is a reference to John's fixed version. John merges in Mary's changes to http://www.cellml.org/models/id/0ff280ef-dce6-4a42-a275-c9a7d9699096/ and continues working on more changes. He starts collaborating with Mary, so he grants her write access to http://www.cellml.org/models/id/0ff280ef-dce6-4a42-a275-c9a7d9699096/. Ming wants to create a derivative of John's paper, so he creates a copy of the revision referenced from http://www.cellml.org/models/citation/doe_2007_1/ at http://www.cellml.org/models/id/7a8996e1-8d05-4a29-a7d8-622d047804fc/ and starts working on it (marking up the history in the model metadata). As you can see, instead of having a confusing mix of variants and versions (with versions of variants of versions of variants), having a single revision forces us to look at the metadata instead, which then is sufficiently general not to have the problems we have seen. >> - It's CellML Code, right? >> Why not put code in a real code management system, like Subversion? >> > > Subversion works well for filesystems of code and text data and to > some extent binary data that we don't really need to query the > contents of. If this applies well for CellML modelling, then > subversion is probably a good match. Subversion will bring its own > complexities when we are dealing with applying security to file > objects, It depends whether or not we actually allow direct access to Subversion by untrusted users. A simple approach would be to make everyone go through the front-end (which might even implement enough methods to let Subversion check out from there anyway). > and security/publishing in general will get even more complex > if we are proxying remote repositories - which we talked about a few > weeks ago. > > Generally, I think the concept of cellml modelling being laid out in a > filesystem and subversion versioning concepts applied to it is good, > but untested. For instance, take a reasonably complex model of Andre's > and work out how it will look on the filesystem and what subversion > versioning would result in. > I think Andre already has a layout for his model (with relative URLs). Letting the author decide what it looks like is probably a good first step. > While in this thread, I don't believe metadata should be treated any > differently to model data. Adding special rules for versioning of some > data and not others is going to complicate the versioning process and > I can't see any compelling reason to do this. I agree (for metadata about the model at least. Permissions etc... are a special case of course). > Remember that the > subversion system is versioning file objects which will contain both > metadata and cellml model data. What is important is how and where > metadata is stored. Perhaps metadata should be seperated into its own > document sitting next to the model in the filesystem. > Model is a confusing word because CellML 1.1 models can combine several models to make one mathematical model. There is a case for metadata / manifest about the mathematical model as well as metadata about each the CellML models that make up the mathematical model. > My inclination is that an implementation using subversion plus some > subversion hooks will be ok, but we haven't worked out details or done > any proof of concept for this - which should be agnositic to cellml > This would have the benefit of supporting non-CellML models, although it means that we have to change the CellML models if we are going to include RDF/XML serialisations inside them. Perhaps a generic framework with some XML with embedded RDF specific parts slotted into it would be better. > and focussed on how to apply zope+cmf security and workflows to data > objects stored in subversion repositories. > If we are going to be doing a major re-write, now is the time to consider if we should be using Zope, or if we want to proxy this part of the site to some other technology (I think that the decision the first time was not discussed at CellML meetings at all, and has had a lot of unfortunate consequences, so I don't think it is completely out of the question to reconsider technologies. The fact that we are already using it probably carries some weight in the decision, but other factors might be enough to tip the balance in another direction). > >> - Zope has revision control >> Until someone packs the database. >> > > Perhaps you should look at http://plone.org/products/plone/roadmap/8 > (which is now completed and merged into Plone 3). There are some other > add on products - some listed in > http://plone.org/products/by-category/versioning-staging > > > >> - Zope/Plone is also quite slow. >> > > Really? How so? > I think an interpreted language, even a byte-compiled one, will always be slow, and all the layers of abstraction from Zope and Plone probably make this worse. However, I'm not sure that it is the bottleneck for the majority of users given the recent thread about network speeds. > >> - Code we have now cannot get away from original design flaws. Might as >> well start from scratch. >> > > Refactoring may achieve the outcome better. > I agree that this will be better in general (throwing away everything is probably a bit drastic, I am sure that there are some parts of the code that are still usable). Of course, if we move off Python this might be the only option, so we should keep an open mind but be wary of the costs of doing so. Best regards, Andrew _______________________________________________ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion