Re: [cellml-discussion] Concerning the CellML Model Repository

Matt Fri, 22 Jun 2007 00:09:08 -0700

On 6/22/07, Andrew Miller <[EMAIL PROTECTED]> wrote:
> Matt wrote:
> >> - Version/Variant
> >> It already clogged up the system.  There is no proper revision control 
> >> mechanism, what we have now is an ad-hoc emulated system.
> >>
> >
> > I don't think it has clogged the system I just think it has been
> > improperly used both by authors and by the user interface. This is no
> > fault of the authors, there is simply a specification for versioning
> > that is missing. The hope is that subversion applies well to this.
> >
> I think that the versioning system itself is the root of the problem,
> because it is simultaneously too complicated and too limited.
>
> In particular:
> Branching is inherently a hierarchical process with arbitrary depth, in
> the sense that branches can be made from branches to an arbitrary depth.
> However, the variant / version system does not really provide the proper
> tools to deal with this, because it is limited to two levels (variant
> and version) before its utility in tracking what is a derivative of what
> is exhausted.
>
> It is also inadequate because a new model might combine parts of other
> models, especially if it is a 1.1 model, and these parts need to be
> tracked individually.
>
> I think that the solution is to simplify down to a single global version
> number that is common across the repository or the model (like in
> Subversion), and then let either the CellML metadata, or perhaps the
> Subversion copy history, describe the way a model has been derived.


Sure, so disregarding variants for now, there is nothing stopping this
being implemented with the current versioning/naming convention.
There's just no specification for proper use. However I think
changesets (as well as global versions) apply well to the notion of a
workspace, but I'm not certain about the common practice of
trunk/branch roots as applied to cellml - perhaps the best practice
would be that every workspace would be the trunk root.

>
> I see the following workflow as being both simpler and more general...
>
> John Doe creates a new model directory which has its primary URL at:
> http://www.cellml.org/models/id/0ff280ef-dce6-4a42-a275-c9a7d9699096/
>
> John now owns this model and is the only one who can change it. John
> also gets to decide the visibility of different revisions of the model.

This is change the model, or the model + metadata?

>
> John makes several revisions to the model (each of which bumps the
> global revision number). There is a URL by which each historic version
> can be referred to.
>
> John then publishes the model in a journal, referring to it by the
> primary URL (or perhaps a short-form if we want to offer authors the
> option of assigning one). After the paper is accepted by a peer-reviewed
> journal, John updates the metadata on the model. When he commits these
> changes, the repository sees this and creates a new alias, e.g. at:
> http://www.cellml.org/models/citation/doe_2007_1/
>
> John makes some further changes to his model post-publication and
> commits them. However, by some mechanism (perhaps by the change
> metadata?) the repository knows that this is a change which occurred
> post-publication by John.
>
> Mary notices that there was a discrepancy between the model and John's
> published paper (assuming that he didn't reference the CellML model in
> the paper). She creates a new primary URL containing a copy of John's
> published model, at:
> http://www.cellml.org/models/id/281ab697-4607-4fcf-a433-f3ec382fb445/
> She gets John to check this. When John agrees, she updates the metadata
> on her model to indicate that her version is a more correct version of
> John's paper. The repository then updates so that
> http://www.cellml.org/models/citation/doe_2007_1/ is a reference to
> John's fixed version.
>
> John merges in Mary's changes to
> http://www.cellml.org/models/id/0ff280ef-dce6-4a42-a275-c9a7d9699096/
> and continues working on more changes. He starts collaborating with
> Mary, so he grants her write access to
> http://www.cellml.org/models/id/0ff280ef-dce6-4a42-a275-c9a7d9699096/.
>
> Ming wants to create a derivative of John's paper, so he creates a copy
> of the revision referenced from
> http://www.cellml.org/models/citation/doe_2007_1/ at
> http://www.cellml.org/models/id/7a8996e1-8d05-4a29-a7d8-622d047804fc/
> and starts working on it (marking up the history in the model metadata).
>
> As you can see, instead of having a confusing mix of variants and
> versions (with versions of variants of versions of variants), having a
> single revision forces us to look at the metadata instead, which then is
> sufficiently general not to have the problems we have seen.

Yep, I reckon variants didn't work out at all and the metadata is a
better place for this information.

>
> >> - It's CellML Code, right?
> >> Why not put code in a real code management system, like Subversion?
> >>
> >
> > Subversion works well for filesystems of code and text data and to
> > some extent binary data that we don't really need to query the
> > contents of. If this applies well for CellML modelling, then
> > subversion is probably a good match. Subversion will bring its own
> > complexities when we are dealing with applying security to file
> > objects,
> It depends whether or not we actually allow direct access to Subversion
> by untrusted users.
> A simple approach would be to make everyone go through the front-end
> (which might even implement enough methods to let Subversion check out
> from there anyway).

Yup, that is one way.

>
>  >  and security/publishing in general will get even more complex
> > if we are proxying remote repositories - which we talked about a few
> > weeks ago.
> >
> > Generally, I think the concept of cellml modelling being laid out in a
> > filesystem and subversion versioning concepts applied to it is good,
> > but untested. For instance, take a reasonably complex model of Andre's
> > and work out how it will look on the filesystem and  what subversion
> > versioning would result in.
> >
> I think Andre already has a layout for his model (with relative URLs).
> Letting the author decide what it looks like is probably a good first step.
> > While in this thread, I don't believe metadata should be treated any
> > differently to model data. Adding special rules for versioning of some
> > data and not others is going to complicate the versioning process and
> > I can't see any compelling reason to do this.
> I agree (for metadata about the model at least. Permissions etc... are a
> special case of course).
> >  Remember that the
> > subversion system is versioning file objects which will contain both
> > metadata and cellml model data. What is important is how and where
> > metadata is stored. Perhaps metadata should be seperated into its own
> > document sitting next to the model in the filesystem.
> >
> Model is a confusing word because CellML 1.1 models can combine several
> models to make one mathematical model. There is a case for metadata /
> manifest about the mathematical model as well as metadata about each the
> CellML models that make up the mathematical model.
> > My inclination is that an implementation using subversion plus some
> > subversion hooks will be ok, but we haven't worked out details or done
> > any proof of concept for this - which should be agnositic to cellml
> >
> This would have the benefit of supporting non-CellML models, although it
> means that we have to change the CellML models if we are going to
> include RDF/XML serialisations inside them.
>
>  Perhaps a generic framework with some XML with embedded RDF specific
> parts slotted into it would be better.
>
> > and focussed on how to apply zope+cmf security and workflows to data
> > objects stored in subversion repositories.
> >
> If we are going to be doing a major re-write, now is the time to
> consider if we should be using Zope, or if we want to proxy this part of
> the site to some other technology (I think that the decision the first
> time was not discussed at CellML meetings at all, and has had a lot of
> unfortunate consequences, so I don't think it is completely out of the
> question to reconsider technologies. The fact that we are already using
> it probably carries some weight in the decision, but other factors might
> be enough to tip the balance in another direction).

Yep. It doesn't have to be Zope at all. It provides a reasonable
foundation though, like others in this space. Others on the block
could be pylons, Ruby on Rails, Zope 3, maybe just apache+cgi (but
that would be a pretty big rewrite).

> >
> >> - Zope has revision control
> >> Until someone packs the database.
> >>
> >
> > Perhaps you should look at http://plone.org/products/plone/roadmap/8
> > (which is now completed and merged into Plone 3). There are some other
> > add on products - some listed in
> > http://plone.org/products/by-category/versioning-staging
> >
> >
> >
> >> - Zope/Plone is also quite slow.
> >>
> >
> > Really? How so?
> >
> I think an interpreted language, even a byte-compiled one, will always
> be slow, and all the layers of abstraction from Zope and Plone probably
> make this worse. However, I'm not sure that it is the bottleneck for the
> majority of users given the recent thread about network speeds.

Yeah. I haven't seen a bottleneck according to Zope/Plone being identified yet.

> >
> >> - Code we have now cannot get away from original design flaws.  Might as 
> >> well start from scratch.
> >>
> >
> > Refactoring may achieve the outcome better.
> >
> I agree that this will be better in general (throwing away everything is
> probably a bit drastic, I am sure that there are some parts of the code
> that are still usable). Of course, if we move off Python this might be
> the only option, so we should keep an open mind but be wary of the costs
> of doing so.

Any opinions of other environments you would consider? They should
probably go into the mix now.


>
> Best regards,
> Andrew
>
> _______________________________________________
> cellml-discussion mailing list
> [email protected]
> http://www.cellml.org/mailman/listinfo/cellml-discussion
>
_______________________________________________
cellml-discussion mailing list
[email protected]
http://www.cellml.org/mailman/listinfo/cellml-discussion

Re: [cellml-discussion] Concerning the CellML Model Repository

Reply via email to