Hi, I thought Andrew's ideas here is worth expanding, and I wrote a page based on that.
http://www.cellml.org/Members/tommy/BaseRepository Cheers, Tommy. Andrew Miller wrote: > Matt wrote: >>> - Version/Variant >>> It already clogged up the system. There is no proper revision control >>> mechanism, what we have now is an ad-hoc emulated system. >>> >> I don't think it has clogged the system I just think it has been >> improperly used both by authors and by the user interface. This is no >> fault of the authors, there is simply a specification for versioning >> that is missing. The hope is that subversion applies well to this. >> > I think that the versioning system itself is the root of the problem, > because it is simultaneously too complicated and too limited. > > In particular: > Branching is inherently a hierarchical process with arbitrary depth, in > the sense that branches can be made from branches to an arbitrary depth. > However, the variant / version system does not really provide the proper > tools to deal with this, because it is limited to two levels (variant > and version) before its utility in tracking what is a derivative of what > is exhausted. > > It is also inadequate because a new model might combine parts of other > models, especially if it is a 1.1 model, and these parts need to be > tracked individually. > > I think that the solution is to simplify down to a single global version > number that is common across the repository or the model (like in > Subversion), and then let either the CellML metadata, or perhaps the > Subversion copy history, describe the way a model has been derived. > > I see the following workflow as being both simpler and more general... > > John Doe creates a new model directory which has its primary URL at: > http://www.cellml.org/models/id/0ff280ef-dce6-4a42-a275-c9a7d9699096/ > > John now owns this model and is the only one who can change it. John > also gets to decide the visibility of different revisions of the model. > > John makes several revisions to the model (each of which bumps the > global revision number). There is a URL by which each historic version > can be referred to. > > John then publishes the model in a journal, referring to it by the > primary URL (or perhaps a short-form if we want to offer authors the > option of assigning one). After the paper is accepted by a peer-reviewed > journal, John updates the metadata on the model. When he commits these > changes, the repository sees this and creates a new alias, e.g. at: > http://www.cellml.org/models/citation/doe_2007_1/ > > John makes some further changes to his model post-publication and > commits them. However, by some mechanism (perhaps by the change > metadata?) the repository knows that this is a change which occurred > post-publication by John. > > Mary notices that there was a discrepancy between the model and John's > published paper (assuming that he didn't reference the CellML model in > the paper). She creates a new primary URL containing a copy of John's > published model, at: > http://www.cellml.org/models/id/281ab697-4607-4fcf-a433-f3ec382fb445/ > She gets John to check this. When John agrees, she updates the metadata > on her model to indicate that her version is a more correct version of > John's paper. The repository then updates so that > http://www.cellml.org/models/citation/doe_2007_1/ is a reference to > John's fixed version. > > John merges in Mary's changes to > http://www.cellml.org/models/id/0ff280ef-dce6-4a42-a275-c9a7d9699096/ > and continues working on more changes. He starts collaborating with > Mary, so he grants her write access to > http://www.cellml.org/models/id/0ff280ef-dce6-4a42-a275-c9a7d9699096/. > > Ming wants to create a derivative of John's paper, so he creates a copy > of the revision referenced from > http://www.cellml.org/models/citation/doe_2007_1/ at > http://www.cellml.org/models/id/7a8996e1-8d05-4a29-a7d8-622d047804fc/ > and starts working on it (marking up the history in the model metadata). > > As you can see, instead of having a confusing mix of variants and > versions (with versions of variants of versions of variants), having a > single revision forces us to look at the metadata instead, which then is > sufficiently general not to have the problems we have seen. > >>> - It's CellML Code, right? >>> Why not put code in a real code management system, like Subversion? >>> >> Subversion works well for filesystems of code and text data and to >> some extent binary data that we don't really need to query the >> contents of. If this applies well for CellML modelling, then >> subversion is probably a good match. Subversion will bring its own >> complexities when we are dealing with applying security to file >> objects, > It depends whether or not we actually allow direct access to Subversion > by untrusted users. > A simple approach would be to make everyone go through the front-end > (which might even implement enough methods to let Subversion check out > from there anyway). > > > and security/publishing in general will get even more complex >> if we are proxying remote repositories - which we talked about a few >> weeks ago. >> >> Generally, I think the concept of cellml modelling being laid out in a >> filesystem and subversion versioning concepts applied to it is good, >> but untested. For instance, take a reasonably complex model of Andre's >> and work out how it will look on the filesystem and what subversion >> versioning would result in. >> > I think Andre already has a layout for his model (with relative URLs). > Letting the author decide what it looks like is probably a good first step. >> While in this thread, I don't believe metadata should be treated any >> differently to model data. Adding special rules for versioning of some >> data and not others is going to complicate the versioning process and >> I can't see any compelling reason to do this. > I agree (for metadata about the model at least. Permissions etc... are a > special case of course). >> Remember that the >> subversion system is versioning file objects which will contain both >> metadata and cellml model data. What is important is how and where >> metadata is stored. Perhaps metadata should be seperated into its own >> document sitting next to the model in the filesystem. >> > Model is a confusing word because CellML 1.1 models can combine several > models to make one mathematical model. There is a case for metadata / > manifest about the mathematical model as well as metadata about each the > CellML models that make up the mathematical model. >> My inclination is that an implementation using subversion plus some >> subversion hooks will be ok, but we haven't worked out details or done >> any proof of concept for this - which should be agnositic to cellml >> > This would have the benefit of supporting non-CellML models, although it > means that we have to change the CellML models if we are going to > include RDF/XML serialisations inside them. > > Perhaps a generic framework with some XML with embedded RDF specific > parts slotted into it would be better. > >> and focussed on how to apply zope+cmf security and workflows to data >> objects stored in subversion repositories. >> > If we are going to be doing a major re-write, now is the time to > consider if we should be using Zope, or if we want to proxy this part of > the site to some other technology (I think that the decision the first > time was not discussed at CellML meetings at all, and has had a lot of > unfortunate consequences, so I don't think it is completely out of the > question to reconsider technologies. The fact that we are already using > it probably carries some weight in the decision, but other factors might > be enough to tip the balance in another direction). >> >>> - Zope has revision control >>> Until someone packs the database. >>> >> Perhaps you should look at http://plone.org/products/plone/roadmap/8 >> (which is now completed and merged into Plone 3). There are some other >> add on products - some listed in >> http://plone.org/products/by-category/versioning-staging >> >> >> >>> - Zope/Plone is also quite slow. >>> >> Really? How so? >> > I think an interpreted language, even a byte-compiled one, will always > be slow, and all the layers of abstraction from Zope and Plone probably > make this worse. However, I'm not sure that it is the bottleneck for the > majority of users given the recent thread about network speeds. >> >>> - Code we have now cannot get away from original design flaws. Might as >>> well start from scratch. >>> >> Refactoring may achieve the outcome better. >> > I agree that this will be better in general (throwing away everything is > probably a bit drastic, I am sure that there are some parts of the code > that are still usable). Of course, if we move off Python this might be > the only option, so we should keep an open mind but be wary of the costs > of doing so. > > Best regards, > Andrew > > _______________________________________________ > cellml-discussion mailing list > [email protected] > http://www.cellml.org/mailman/listinfo/cellml-discussion _______________________________________________ cellml-discussion mailing list [email protected] http://www.cellml.org/mailman/listinfo/cellml-discussion
