Re: [cellml-discussion] Concerning the CellML Model Repository

Tommy Yu Mon, 25 Jun 2007 21:31:06 -0700

Hi,

I thought Andrew's ideas here is worth expanding, and I wrote a page based on 
that.


http://www.cellml.org/Members/tommy/BaseRepository

Cheers,
Tommy.



Andrew Miller wrote:
> Matt wrote:
>>> - Version/Variant
>>> It already clogged up the system.  There is no proper revision control 
>>> mechanism, what we have now is an ad-hoc emulated system.
>>>     
>> I don't think it has clogged the system I just think it has been
>> improperly used both by authors and by the user interface. This is no
>> fault of the authors, there is simply a specification for versioning
>> that is missing. The hope is that subversion applies well to this.
>>   
> I think that the versioning system itself is the root of the problem, 
> because it is simultaneously too complicated and too limited.
> 
> In particular:
> Branching is inherently a hierarchical process with arbitrary depth, in 
> the sense that branches can be made from branches to an arbitrary depth. 
> However, the variant / version system does not really provide the proper 
> tools to deal with this, because it is limited to two levels (variant 
> and version) before its utility in tracking what is a derivative of what 
> is exhausted.
> 
> It is also inadequate because a new model might combine parts of other 
> models, especially if it is a 1.1 model, and these parts need to be 
> tracked individually.
> 
> I think that the solution is to simplify down to a single global version 
> number that is common across the repository or the model (like in 
> Subversion), and then let either the CellML metadata, or perhaps the 
> Subversion copy history, describe the way a model has been derived.
> 
> I see the following workflow as being both simpler and more general...
> 
> John Doe creates a new model directory which has its primary URL at:
> http://www.cellml.org/models/id/0ff280ef-dce6-4a42-a275-c9a7d9699096/
> 
> John now owns this model and is the only one who can change it. John 
> also gets to decide the visibility of different revisions of the model.
> 
> John makes several revisions to the model (each of which bumps the 
> global revision number). There is a URL by which each historic version 
> can be referred to.
> 
> John then publishes the model in a journal, referring to it by the 
> primary URL (or perhaps a short-form if we want to offer authors the 
> option of assigning one). After the paper is accepted by a peer-reviewed 
> journal, John updates the metadata on the model. When he commits these 
> changes, the repository sees this and creates a new alias, e.g. at:
> http://www.cellml.org/models/citation/doe_2007_1/
> 
> John makes some further changes to his model post-publication and 
> commits them. However, by some mechanism (perhaps by the change 
> metadata?) the repository knows that this is a change which occurred 
> post-publication by John.
> 
> Mary notices that there was a discrepancy between the model and John's 
> published paper (assuming that he didn't reference the CellML model in 
> the paper). She creates a new primary URL containing a copy of John's 
> published model, at:
> http://www.cellml.org/models/id/281ab697-4607-4fcf-a433-f3ec382fb445/
> She gets John to check this. When John agrees, she updates the metadata 
> on her model to indicate that her version is a more correct version of 
> John's paper. The repository then updates so that 
> http://www.cellml.org/models/citation/doe_2007_1/ is a reference to 
> John's fixed version.
> 
> John merges in Mary's changes to 
> http://www.cellml.org/models/id/0ff280ef-dce6-4a42-a275-c9a7d9699096/ 
> and continues working on more changes. He starts collaborating with 
> Mary, so he grants her write access to 
> http://www.cellml.org/models/id/0ff280ef-dce6-4a42-a275-c9a7d9699096/.
> 
> Ming wants to create a derivative of John's paper, so he creates a copy 
> of the revision referenced from 
> http://www.cellml.org/models/citation/doe_2007_1/ at 
> http://www.cellml.org/models/id/7a8996e1-8d05-4a29-a7d8-622d047804fc/ 
> and starts working on it (marking up the history in the model metadata).
> 
> As you can see, instead of having a confusing mix of variants and 
> versions (with versions of variants of versions of variants), having a 
> single revision forces us to look at the metadata instead, which then is 
> sufficiently general not to have the problems we have seen.
> 
>>> - It's CellML Code, right?
>>> Why not put code in a real code management system, like Subversion?
>>>     
>> Subversion works well for filesystems of code and text data and to
>> some extent binary data that we don't really need to query the
>> contents of. If this applies well for CellML modelling, then
>> subversion is probably a good match. Subversion will bring its own
>> complexities when we are dealing with applying security to file
>> objects,
> It depends whether or not we actually allow direct access to Subversion 
> by untrusted users.
> A simple approach would be to make everyone go through the front-end 
> (which might even implement enough methods to let Subversion check out 
> from there anyway).
> 
>  >  and security/publishing in general will get even more complex
>> if we are proxying remote repositories - which we talked about a few
>> weeks ago.
>>
>> Generally, I think the concept of cellml modelling being laid out in a
>> filesystem and subversion versioning concepts applied to it is good,
>> but untested. For instance, take a reasonably complex model of Andre's
>> and work out how it will look on the filesystem and  what subversion
>> versioning would result in.
>>   
> I think Andre already has a layout for his model (with relative URLs). 
> Letting the author decide what it looks like is probably a good first step.
>> While in this thread, I don't believe metadata should be treated any
>> differently to model data. Adding special rules for versioning of some
>> data and not others is going to complicate the versioning process and
>> I can't see any compelling reason to do this.
> I agree (for metadata about the model at least. Permissions etc... are a 
> special case of course).
>>  Remember that the
>> subversion system is versioning file objects which will contain both
>> metadata and cellml model data. What is important is how and where
>> metadata is stored. Perhaps metadata should be seperated into its own
>> document sitting next to the model in the filesystem.
>>   
> Model is a confusing word because CellML 1.1 models can combine several 
> models to make one mathematical model. There is a case for metadata / 
> manifest about the mathematical model as well as metadata about each the 
> CellML models that make up the mathematical model.
>> My inclination is that an implementation using subversion plus some
>> subversion hooks will be ok, but we haven't worked out details or done
>> any proof of concept for this - which should be agnositic to cellml
>>   
> This would have the benefit of supporting non-CellML models, although it 
> means that we have to change the CellML models if we are going to 
> include RDF/XML serialisations inside them.
> 
>  Perhaps a generic framework with some XML with embedded RDF specific 
> parts slotted into it would be better.
> 
>> and focussed on how to apply zope+cmf security and workflows to data
>> objects stored in subversion repositories.
>>   
> If we are going to be doing a major re-write, now is the time to 
> consider if we should be using Zope, or if we want to proxy this part of 
> the site to some other technology (I think that the decision the first 
> time was not discussed at CellML meetings at all, and has had a lot of 
> unfortunate consequences, so I don't think it is completely out of the 
> question to reconsider technologies. The fact that we are already using 
> it probably carries some weight in the decision, but other factors might 
> be enough to tip the balance in another direction).
>>   
>>> - Zope has revision control
>>> Until someone packs the database.
>>>     
>> Perhaps you should look at http://plone.org/products/plone/roadmap/8
>> (which is now completed and merged into Plone 3). There are some other
>> add on products - some listed in
>> http://plone.org/products/by-category/versioning-staging
>>
>>
>>   
>>> - Zope/Plone is also quite slow.
>>>     
>> Really? How so?
>>   
> I think an interpreted language, even a byte-compiled one, will always 
> be slow, and all the layers of abstraction from Zope and Plone probably 
> make this worse. However, I'm not sure that it is the bottleneck for the 
> majority of users given the recent thread about network speeds.
>>   
>>> - Code we have now cannot get away from original design flaws.  Might as 
>>> well start from scratch.
>>>     
>> Refactoring may achieve the outcome better.
>>   
> I agree that this will be better in general (throwing away everything is 
> probably a bit drastic, I am sure that there are some parts of the code 
> that are still usable). Of course, if we move off Python this might be 
> the only option, so we should keep an open mind but be wary of the costs 
> of doing so.
> 
> Best regards,
> Andrew
> 
> _______________________________________________
> cellml-discussion mailing list
> [email protected]
> http://www.cellml.org/mailman/listinfo/cellml-discussion

_______________________________________________
cellml-discussion mailing list
[email protected]
http://www.cellml.org/mailman/listinfo/cellml-discussion

Re: [cellml-discussion] Concerning the CellML Model Repository

Reply via email to