Re: [cellml-discussion] Concerning the CellML Model Repository

Andrew Miller Thu, 21 Jun 2007 22:45:28 -0700

Matt wrote:
>> - Version/Variant
>> It already clogged up the system.  There is no proper revision control 
>> mechanism, what we have now is an ad-hoc emulated system.
>>     
>
> I don't think it has clogged the system I just think it has been
> improperly used both by authors and by the user interface. This is no
> fault of the authors, there is simply a specification for versioning
> that is missing. The hope is that subversion applies well to this.
>   
I think that the versioning system itself is the root of the problem, 
because it is simultaneously too complicated and too limited.


In particular:
Branching is inherently a hierarchical process with arbitrary depth, in 
the sense that branches can be made from branches to an arbitrary depth. 
However, the variant / version system does not really provide the proper 
tools to deal with this, because it is limited to two levels (variant 
and version) before its utility in tracking what is a derivative of what 
is exhausted.

It is also inadequate because a new model might combine parts of other 
models, especially if it is a 1.1 model, and these parts need to be 
tracked individually.

I think that the solution is to simplify down to a single global version 
number that is common across the repository or the model (like in 
Subversion), and then let either the CellML metadata, or perhaps the 
Subversion copy history, describe the way a model has been derived.

I see the following workflow as being both simpler and more general...

John Doe creates a new model directory which has its primary URL at:
http://www.cellml.org/models/id/0ff280ef-dce6-4a42-a275-c9a7d9699096/

John now owns this model and is the only one who can change it. John 
also gets to decide the visibility of different revisions of the model.

John makes several revisions to the model (each of which bumps the 
global revision number). There is a URL by which each historic version 
can be referred to.

John then publishes the model in a journal, referring to it by the 
primary URL (or perhaps a short-form if we want to offer authors the 
option of assigning one). After the paper is accepted by a peer-reviewed 
journal, John updates the metadata on the model. When he commits these 
changes, the repository sees this and creates a new alias, e.g. at:
http://www.cellml.org/models/citation/doe_2007_1/

John makes some further changes to his model post-publication and 
commits them. However, by some mechanism (perhaps by the change 
metadata?) the repository knows that this is a change which occurred 
post-publication by John.

Mary notices that there was a discrepancy between the model and John's 
published paper (assuming that he didn't reference the CellML model in 
the paper). She creates a new primary URL containing a copy of John's 
published model, at:
http://www.cellml.org/models/id/281ab697-4607-4fcf-a433-f3ec382fb445/
She gets John to check this. When John agrees, she updates the metadata 
on her model to indicate that her version is a more correct version of 
John's paper. The repository then updates so that 
http://www.cellml.org/models/citation/doe_2007_1/ is a reference to 
John's fixed version.

John merges in Mary's changes to 
http://www.cellml.org/models/id/0ff280ef-dce6-4a42-a275-c9a7d9699096/ 
and continues working on more changes. He starts collaborating with 
Mary, so he grants her write access to 
http://www.cellml.org/models/id/0ff280ef-dce6-4a42-a275-c9a7d9699096/.

Ming wants to create a derivative of John's paper, so he creates a copy 
of the revision referenced from 
http://www.cellml.org/models/citation/doe_2007_1/ at 
http://www.cellml.org/models/id/7a8996e1-8d05-4a29-a7d8-622d047804fc/ 
and starts working on it (marking up the history in the model metadata).

As you can see, instead of having a confusing mix of variants and 
versions (with versions of variants of versions of variants), having a 
single revision forces us to look at the metadata instead, which then is 
sufficiently general not to have the problems we have seen.

>> - It's CellML Code, right?
>> Why not put code in a real code management system, like Subversion?
>>     
>
> Subversion works well for filesystems of code and text data and to
> some extent binary data that we don't really need to query the
> contents of. If this applies well for CellML modelling, then
> subversion is probably a good match. Subversion will bring its own
> complexities when we are dealing with applying security to file
> objects,
It depends whether or not we actually allow direct access to Subversion 
by untrusted users.
A simple approach would be to make everyone go through the front-end 
(which might even implement enough methods to let Subversion check out 
from there anyway).

 >  and security/publishing in general will get even more complex
> if we are proxying remote repositories - which we talked about a few
> weeks ago.
>
> Generally, I think the concept of cellml modelling being laid out in a
> filesystem and subversion versioning concepts applied to it is good,
> but untested. For instance, take a reasonably complex model of Andre's
> and work out how it will look on the filesystem and  what subversion
> versioning would result in.
>   
I think Andre already has a layout for his model (with relative URLs). 
Letting the author decide what it looks like is probably a good first step.
> While in this thread, I don't believe metadata should be treated any
> differently to model data. Adding special rules for versioning of some
> data and not others is going to complicate the versioning process and
> I can't see any compelling reason to do this.
I agree (for metadata about the model at least. Permissions etc... are a 
special case of course).
>  Remember that the
> subversion system is versioning file objects which will contain both
> metadata and cellml model data. What is important is how and where
> metadata is stored. Perhaps metadata should be seperated into its own
> document sitting next to the model in the filesystem.
>   
Model is a confusing word because CellML 1.1 models can combine several 
models to make one mathematical model. There is a case for metadata / 
manifest about the mathematical model as well as metadata about each the 
CellML models that make up the mathematical model.
> My inclination is that an implementation using subversion plus some
> subversion hooks will be ok, but we haven't worked out details or done
> any proof of concept for this - which should be agnositic to cellml
>   
This would have the benefit of supporting non-CellML models, although it 
means that we have to change the CellML models if we are going to 
include RDF/XML serialisations inside them.

 Perhaps a generic framework with some XML with embedded RDF specific 
parts slotted into it would be better.

> and focussed on how to apply zope+cmf security and workflows to data
> objects stored in subversion repositories.
>   
If we are going to be doing a major re-write, now is the time to 
consider if we should be using Zope, or if we want to proxy this part of 
the site to some other technology (I think that the decision the first 
time was not discussed at CellML meetings at all, and has had a lot of 
unfortunate consequences, so I don't think it is completely out of the 
question to reconsider technologies. The fact that we are already using 
it probably carries some weight in the decision, but other factors might 
be enough to tip the balance in another direction).
>   
>> - Zope has revision control
>> Until someone packs the database.
>>     
>
> Perhaps you should look at http://plone.org/products/plone/roadmap/8
> (which is now completed and merged into Plone 3). There are some other
> add on products - some listed in
> http://plone.org/products/by-category/versioning-staging
>
>
>   
>> - Zope/Plone is also quite slow.
>>     
>
> Really? How so?
>   
I think an interpreted language, even a byte-compiled one, will always 
be slow, and all the layers of abstraction from Zope and Plone probably 
make this worse. However, I'm not sure that it is the bottleneck for the 
majority of users given the recent thread about network speeds.
>   
>> - Code we have now cannot get away from original design flaws.  Might as 
>> well start from scratch.
>>     
>
> Refactoring may achieve the outcome better.
>   
I agree that this will be better in general (throwing away everything is 
probably a bit drastic, I am sure that there are some parts of the code 
that are still usable). Of course, if we move off Python this might be 
the only option, so we should keep an open mind but be wary of the costs 
of doing so.

Best regards,
Andrew

_______________________________________________
cellml-discussion mailing list
cellml-discussion@cellml.org
http://www.cellml.org/mailman/listinfo/cellml-discussion

Re: [cellml-discussion] Concerning the CellML Model Repository

Reply via email to