Re: [cellml-discussion] Concerning the CellML Model Repository

Tommy Yu Tue, 26 Jun 2007 01:49:09 -0700

Matt wrote:
> I don't understand the purpose of this.
> 
> It looks like you are inventing a versioning system to implement from scratch.
>


That's what it looks like, but if you recall the software choices I have 
written down I have been considering them, and have been going through the 
features they offer that would be of use to us.  I am being agnostic to 
software right here, as someone would like me to stay away from underlying 
pieces at the moment.  I am conveying a generic concept of a repository in the 
context of CellML model development, outlining possible hints to what may be 
best practices.

> I don't see how this system would work with someone working on a
> filesystem and not wanting to use a browser - you'd have to invent
> client software for this.
> 
> Start by reviewing things like:
> 
> subversion
> svk
> darcs
> monotone
> arch
> etc
> 

I did, I already suggested to use Subversion as a possible backend (I also 
reviewed how GIT might be a reasonable choice if we proxy remote repositories), 
possibly a RDBMS to help with the relationship aspect of models, and Plone/Zope 
for the workflow states and presentation front-end via WWW.

> Review them in the context of the use-cases that need to be satisfied.
> 
> Include use-cases such as someone working on a complex model that uses
> imports of models in a local space. Include use-cases of someone
> wanting to follow volatile vs non-volatile versions/branches, etc.
> 

If a model builder develops their models in their local space they could import 
items from within their projects via relative paths (no different than working 
locally on their storage device).  If they rely on other models they could 
import a specific frozen version of a model, or the development version, from 
the repository.  Volatile versions are provided for anyone who need it.

> Include the environments from which you expect this versioning system
> to work (e.g. commands on a filesystem, webdav, etc).
> 

If it's subversion someone could do a svn ci or use their GUI clients to update 
models.  They could also update via WWW.

> What are the kinds of relationships between permissions and roles. I
> know you have some ideas here, but it's not very replete and perhaps
> needs to be put in a table.
> 

They will be put on the table.

> I think aliases in for web URIs are the least of the problems at the moment.
> 
> On 6/26/07, Tommy Yu <[EMAIL PROTECTED]> wrote:
>> Hi,
>>
>> I thought Andrew's ideas here is worth expanding, and I wrote a page based 
>> on that.
>>
>> http://www.cellml.org/Members/tommy/BaseRepository
>>
>> Cheers,
>> Tommy.
>>
>>
>>
>> Andrew Miller wrote:
>>> Matt wrote:
>>>>> - Version/Variant
>>>>> It already clogged up the system.  There is no proper revision control 
>>>>> mechanism, what we have now is an ad-hoc emulated system.
>>>>>
>>>> I don't think it has clogged the system I just think it has been
>>>> improperly used both by authors and by the user interface. This is no
>>>> fault of the authors, there is simply a specification for versioning
>>>> that is missing. The hope is that subversion applies well to this.
>>>>
>>> I think that the versioning system itself is the root of the problem,
>>> because it is simultaneously too complicated and too limited.
>>>
>>> In particular:
>>> Branching is inherently a hierarchical process with arbitrary depth, in
>>> the sense that branches can be made from branches to an arbitrary depth.
>>> However, the variant / version system does not really provide the proper
>>> tools to deal with this, because it is limited to two levels (variant
>>> and version) before its utility in tracking what is a derivative of what
>>> is exhausted.
>>>
>>> It is also inadequate because a new model might combine parts of other
>>> models, especially if it is a 1.1 model, and these parts need to be
>>> tracked individually.
>>>
>>> I think that the solution is to simplify down to a single global version
>>> number that is common across the repository or the model (like in
>>> Subversion), and then let either the CellML metadata, or perhaps the
>>> Subversion copy history, describe the way a model has been derived.
>>>
>>> I see the following workflow as being both simpler and more general...
>>>
>>> John Doe creates a new model directory which has its primary URL at:
>>> http://www.cellml.org/models/id/0ff280ef-dce6-4a42-a275-c9a7d9699096/
>>>
>>> John now owns this model and is the only one who can change it. John
>>> also gets to decide the visibility of different revisions of the model.
>>>
>>> John makes several revisions to the model (each of which bumps the
>>> global revision number). There is a URL by which each historic version
>>> can be referred to.
>>>
>>> John then publishes the model in a journal, referring to it by the
>>> primary URL (or perhaps a short-form if we want to offer authors the
>>> option of assigning one). After the paper is accepted by a peer-reviewed
>>> journal, John updates the metadata on the model. When he commits these
>>> changes, the repository sees this and creates a new alias, e.g. at:
>>> http://www.cellml.org/models/citation/doe_2007_1/
>>>
>>> John makes some further changes to his model post-publication and
>>> commits them. However, by some mechanism (perhaps by the change
>>> metadata?) the repository knows that this is a change which occurred
>>> post-publication by John.
>>>
>>> Mary notices that there was a discrepancy between the model and John's
>>> published paper (assuming that he didn't reference the CellML model in
>>> the paper). She creates a new primary URL containing a copy of John's
>>> published model, at:
>>> http://www.cellml.org/models/id/281ab697-4607-4fcf-a433-f3ec382fb445/
>>> She gets John to check this. When John agrees, she updates the metadata
>>> on her model to indicate that her version is a more correct version of
>>> John's paper. The repository then updates so that
>>> http://www.cellml.org/models/citation/doe_2007_1/ is a reference to
>>> John's fixed version.
>>>
>>> John merges in Mary's changes to
>>> http://www.cellml.org/models/id/0ff280ef-dce6-4a42-a275-c9a7d9699096/
>>> and continues working on more changes. He starts collaborating with
>>> Mary, so he grants her write access to
>>> http://www.cellml.org/models/id/0ff280ef-dce6-4a42-a275-c9a7d9699096/.
>>>
>>> Ming wants to create a derivative of John's paper, so he creates a copy
>>> of the revision referenced from
>>> http://www.cellml.org/models/citation/doe_2007_1/ at
>>> http://www.cellml.org/models/id/7a8996e1-8d05-4a29-a7d8-622d047804fc/
>>> and starts working on it (marking up the history in the model metadata).
>>>
>>> As you can see, instead of having a confusing mix of variants and
>>> versions (with versions of variants of versions of variants), having a
>>> single revision forces us to look at the metadata instead, which then is
>>> sufficiently general not to have the problems we have seen.
>>>
>>>>> - It's CellML Code, right?
>>>>> Why not put code in a real code management system, like Subversion?
>>>>>
>>>> Subversion works well for filesystems of code and text data and to
>>>> some extent binary data that we don't really need to query the
>>>> contents of. If this applies well for CellML modelling, then
>>>> subversion is probably a good match. Subversion will bring its own
>>>> complexities when we are dealing with applying security to file
>>>> objects,
>>> It depends whether or not we actually allow direct access to Subversion
>>> by untrusted users.
>>> A simple approach would be to make everyone go through the front-end
>>> (which might even implement enough methods to let Subversion check out
>>> from there anyway).
>>>
>>>  >  and security/publishing in general will get even more complex
>>>> if we are proxying remote repositories - which we talked about a few
>>>> weeks ago.
>>>>
>>>> Generally, I think the concept of cellml modelling being laid out in a
>>>> filesystem and subversion versioning concepts applied to it is good,
>>>> but untested. For instance, take a reasonably complex model of Andre's
>>>> and work out how it will look on the filesystem and  what subversion
>>>> versioning would result in.
>>>>
>>> I think Andre already has a layout for his model (with relative URLs).
>>> Letting the author decide what it looks like is probably a good first step.
>>>> While in this thread, I don't believe metadata should be treated any
>>>> differently to model data. Adding special rules for versioning of some
>>>> data and not others is going to complicate the versioning process and
>>>> I can't see any compelling reason to do this.
>>> I agree (for metadata about the model at least. Permissions etc... are a
>>> special case of course).
>>>>  Remember that the
>>>> subversion system is versioning file objects which will contain both
>>>> metadata and cellml model data. What is important is how and where
>>>> metadata is stored. Perhaps metadata should be seperated into its own
>>>> document sitting next to the model in the filesystem.
>>>>
>>> Model is a confusing word because CellML 1.1 models can combine several
>>> models to make one mathematical model. There is a case for metadata /
>>> manifest about the mathematical model as well as metadata about each the
>>> CellML models that make up the mathematical model.
>>>> My inclination is that an implementation using subversion plus some
>>>> subversion hooks will be ok, but we haven't worked out details or done
>>>> any proof of concept for this - which should be agnositic to cellml
>>>>
>>> This would have the benefit of supporting non-CellML models, although it
>>> means that we have to change the CellML models if we are going to
>>> include RDF/XML serialisations inside them.
>>>
>>>  Perhaps a generic framework with some XML with embedded RDF specific
>>> parts slotted into it would be better.
>>>
>>>> and focussed on how to apply zope+cmf security and workflows to data
>>>> objects stored in subversion repositories.
>>>>
>>> If we are going to be doing a major re-write, now is the time to
>>> consider if we should be using Zope, or if we want to proxy this part of
>>> the site to some other technology (I think that the decision the first
>>> time was not discussed at CellML meetings at all, and has had a lot of
>>> unfortunate consequences, so I don't think it is completely out of the
>>> question to reconsider technologies. The fact that we are already using
>>> it probably carries some weight in the decision, but other factors might
>>> be enough to tip the balance in another direction).
>>>>> - Zope has revision control
>>>>> Until someone packs the database.
>>>>>
>>>> Perhaps you should look at http://plone.org/products/plone/roadmap/8
>>>> (which is now completed and merged into Plone 3). There are some other
>>>> add on products - some listed in
>>>> http://plone.org/products/by-category/versioning-staging
>>>>
>>>>
>>>>
>>>>> - Zope/Plone is also quite slow.
>>>>>
>>>> Really? How so?
>>>>
>>> I think an interpreted language, even a byte-compiled one, will always
>>> be slow, and all the layers of abstraction from Zope and Plone probably
>>> make this worse. However, I'm not sure that it is the bottleneck for the
>>> majority of users given the recent thread about network speeds.
>>>>> - Code we have now cannot get away from original design flaws.  Might as 
>>>>> well start from scratch.
>>>>>
>>>> Refactoring may achieve the outcome better.
>>>>
>>> I agree that this will be better in general (throwing away everything is
>>> probably a bit drastic, I am sure that there are some parts of the code
>>> that are still usable). Of course, if we move off Python this might be
>>> the only option, so we should keep an open mind but be wary of the costs
>>> of doing so.
>>>
>>> Best regards,
>>> Andrew
>>>
>>> _______________________________________________
>>> cellml-discussion mailing list
>>> [email protected]
>>> http://www.cellml.org/mailman/listinfo/cellml-discussion
>> _______________________________________________
>> cellml-discussion mailing list
>> [email protected]
>> http://www.cellml.org/mailman/listinfo/cellml-discussion
>>
> _______________________________________________
> cellml-discussion mailing list
> [email protected]
> http://www.cellml.org/mailman/listinfo/cellml-discussion

_______________________________________________
cellml-discussion mailing list
[email protected]
http://www.cellml.org/mailman/listinfo/cellml-discussion

Re: [cellml-discussion] Concerning the CellML Model Repository

Reply via email to