Re: [cellml-discussion] Concerning the CellML Model Repository

Matt Tue, 26 Jun 2007 00:55:04 -0700

I don't understand the purpose of this.

It looks like you are inventing a versioning system to implement from scratch.


I don't see how this system would work with someone working on a
filesystem and not wanting to use a browser - you'd have to invent
client software for this.

Start by reviewing things like:

subversion
svk
darcs
monotone
arch
etc

Review them in the context of the use-cases that need to be satisfied.

Include use-cases such as someone working on a complex model that uses
imports of models in a local space. Include use-cases of someone
wanting to follow volatile vs non-volatile versions/branches, etc.

Include the environments from which you expect this versioning system
to work (e.g. commands on a filesystem, webdav, etc).

What are the kinds of relationships between permissions and roles. I
know you have some ideas here, but it's not very replete and perhaps
needs to be put in a table.

I think aliases in for web URIs are the least of the problems at the moment.

On 6/26/07, Tommy Yu <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I thought Andrew's ideas here is worth expanding, and I wrote a page based on 
> that.
>
> http://www.cellml.org/Members/tommy/BaseRepository
>
> Cheers,
> Tommy.
>
>
>
> Andrew Miller wrote:
> > Matt wrote:
> >>> - Version/Variant
> >>> It already clogged up the system.  There is no proper revision control 
> >>> mechanism, what we have now is an ad-hoc emulated system.
> >>>
> >> I don't think it has clogged the system I just think it has been
> >> improperly used both by authors and by the user interface. This is no
> >> fault of the authors, there is simply a specification for versioning
> >> that is missing. The hope is that subversion applies well to this.
> >>
> > I think that the versioning system itself is the root of the problem,
> > because it is simultaneously too complicated and too limited.
> >
> > In particular:
> > Branching is inherently a hierarchical process with arbitrary depth, in
> > the sense that branches can be made from branches to an arbitrary depth.
> > However, the variant / version system does not really provide the proper
> > tools to deal with this, because it is limited to two levels (variant
> > and version) before its utility in tracking what is a derivative of what
> > is exhausted.
> >
> > It is also inadequate because a new model might combine parts of other
> > models, especially if it is a 1.1 model, and these parts need to be
> > tracked individually.
> >
> > I think that the solution is to simplify down to a single global version
> > number that is common across the repository or the model (like in
> > Subversion), and then let either the CellML metadata, or perhaps the
> > Subversion copy history, describe the way a model has been derived.
> >
> > I see the following workflow as being both simpler and more general...
> >
> > John Doe creates a new model directory which has its primary URL at:
> > http://www.cellml.org/models/id/0ff280ef-dce6-4a42-a275-c9a7d9699096/
> >
> > John now owns this model and is the only one who can change it. John
> > also gets to decide the visibility of different revisions of the model.
> >
> > John makes several revisions to the model (each of which bumps the
> > global revision number). There is a URL by which each historic version
> > can be referred to.
> >
> > John then publishes the model in a journal, referring to it by the
> > primary URL (or perhaps a short-form if we want to offer authors the
> > option of assigning one). After the paper is accepted by a peer-reviewed
> > journal, John updates the metadata on the model. When he commits these
> > changes, the repository sees this and creates a new alias, e.g. at:
> > http://www.cellml.org/models/citation/doe_2007_1/
> >
> > John makes some further changes to his model post-publication and
> > commits them. However, by some mechanism (perhaps by the change
> > metadata?) the repository knows that this is a change which occurred
> > post-publication by John.
> >
> > Mary notices that there was a discrepancy between the model and John's
> > published paper (assuming that he didn't reference the CellML model in
> > the paper). She creates a new primary URL containing a copy of John's
> > published model, at:
> > http://www.cellml.org/models/id/281ab697-4607-4fcf-a433-f3ec382fb445/
> > She gets John to check this. When John agrees, she updates the metadata
> > on her model to indicate that her version is a more correct version of
> > John's paper. The repository then updates so that
> > http://www.cellml.org/models/citation/doe_2007_1/ is a reference to
> > John's fixed version.
> >
> > John merges in Mary's changes to
> > http://www.cellml.org/models/id/0ff280ef-dce6-4a42-a275-c9a7d9699096/
> > and continues working on more changes. He starts collaborating with
> > Mary, so he grants her write access to
> > http://www.cellml.org/models/id/0ff280ef-dce6-4a42-a275-c9a7d9699096/.
> >
> > Ming wants to create a derivative of John's paper, so he creates a copy
> > of the revision referenced from
> > http://www.cellml.org/models/citation/doe_2007_1/ at
> > http://www.cellml.org/models/id/7a8996e1-8d05-4a29-a7d8-622d047804fc/
> > and starts working on it (marking up the history in the model metadata).
> >
> > As you can see, instead of having a confusing mix of variants and
> > versions (with versions of variants of versions of variants), having a
> > single revision forces us to look at the metadata instead, which then is
> > sufficiently general not to have the problems we have seen.
> >
> >>> - It's CellML Code, right?
> >>> Why not put code in a real code management system, like Subversion?
> >>>
> >> Subversion works well for filesystems of code and text data and to
> >> some extent binary data that we don't really need to query the
> >> contents of. If this applies well for CellML modelling, then
> >> subversion is probably a good match. Subversion will bring its own
> >> complexities when we are dealing with applying security to file
> >> objects,
> > It depends whether or not we actually allow direct access to Subversion
> > by untrusted users.
> > A simple approach would be to make everyone go through the front-end
> > (which might even implement enough methods to let Subversion check out
> > from there anyway).
> >
> >  >  and security/publishing in general will get even more complex
> >> if we are proxying remote repositories - which we talked about a few
> >> weeks ago.
> >>
> >> Generally, I think the concept of cellml modelling being laid out in a
> >> filesystem and subversion versioning concepts applied to it is good,
> >> but untested. For instance, take a reasonably complex model of Andre's
> >> and work out how it will look on the filesystem and  what subversion
> >> versioning would result in.
> >>
> > I think Andre already has a layout for his model (with relative URLs).
> > Letting the author decide what it looks like is probably a good first step.
> >> While in this thread, I don't believe metadata should be treated any
> >> differently to model data. Adding special rules for versioning of some
> >> data and not others is going to complicate the versioning process and
> >> I can't see any compelling reason to do this.
> > I agree (for metadata about the model at least. Permissions etc... are a
> > special case of course).
> >>  Remember that the
> >> subversion system is versioning file objects which will contain both
> >> metadata and cellml model data. What is important is how and where
> >> metadata is stored. Perhaps metadata should be seperated into its own
> >> document sitting next to the model in the filesystem.
> >>
> > Model is a confusing word because CellML 1.1 models can combine several
> > models to make one mathematical model. There is a case for metadata /
> > manifest about the mathematical model as well as metadata about each the
> > CellML models that make up the mathematical model.
> >> My inclination is that an implementation using subversion plus some
> >> subversion hooks will be ok, but we haven't worked out details or done
> >> any proof of concept for this - which should be agnositic to cellml
> >>
> > This would have the benefit of supporting non-CellML models, although it
> > means that we have to change the CellML models if we are going to
> > include RDF/XML serialisations inside them.
> >
> >  Perhaps a generic framework with some XML with embedded RDF specific
> > parts slotted into it would be better.
> >
> >> and focussed on how to apply zope+cmf security and workflows to data
> >> objects stored in subversion repositories.
> >>
> > If we are going to be doing a major re-write, now is the time to
> > consider if we should be using Zope, or if we want to proxy this part of
> > the site to some other technology (I think that the decision the first
> > time was not discussed at CellML meetings at all, and has had a lot of
> > unfortunate consequences, so I don't think it is completely out of the
> > question to reconsider technologies. The fact that we are already using
> > it probably carries some weight in the decision, but other factors might
> > be enough to tip the balance in another direction).
> >>
> >>> - Zope has revision control
> >>> Until someone packs the database.
> >>>
> >> Perhaps you should look at http://plone.org/products/plone/roadmap/8
> >> (which is now completed and merged into Plone 3). There are some other
> >> add on products - some listed in
> >> http://plone.org/products/by-category/versioning-staging
> >>
> >>
> >>
> >>> - Zope/Plone is also quite slow.
> >>>
> >> Really? How so?
> >>
> > I think an interpreted language, even a byte-compiled one, will always
> > be slow, and all the layers of abstraction from Zope and Plone probably
> > make this worse. However, I'm not sure that it is the bottleneck for the
> > majority of users given the recent thread about network speeds.
> >>
> >>> - Code we have now cannot get away from original design flaws.  Might as 
> >>> well start from scratch.
> >>>
> >> Refactoring may achieve the outcome better.
> >>
> > I agree that this will be better in general (throwing away everything is
> > probably a bit drastic, I am sure that there are some parts of the code
> > that are still usable). Of course, if we move off Python this might be
> > the only option, so we should keep an open mind but be wary of the costs
> > of doing so.
> >
> > Best regards,
> > Andrew
> >
> > _______________________________________________
> > cellml-discussion mailing list
> > [email protected]
> > http://www.cellml.org/mailman/listinfo/cellml-discussion
>
> _______________________________________________
> cellml-discussion mailing list
> [email protected]
> http://www.cellml.org/mailman/listinfo/cellml-discussion
>
_______________________________________________
cellml-discussion mailing list
[email protected]
http://www.cellml.org/mailman/listinfo/cellml-discussion

Re: [cellml-discussion] Concerning the CellML Model Repository

Reply via email to