Re: [cellml-discussion] Concerning the CellML Model Repository

Matt Thu, 21 Jun 2007 21:10:36 -0700

Hi Tommy,

Can you continue to update/fill out your document as well as begin
associated proposals with information contained in the replies people
are submitting. The goal of this process is a scoping document with
associated content.

More comments below.

On 6/22/07, Tommy Yu <[EMAIL PROTECTED]> wrote:
> Matt wrote:
> > Hi Tommy,
> >
> > I found the document seemed to be too far ahead of itself. I also
> > didn't find any of the pros and cons very compelling because they
> > don't address specific problems and those problems are not described.
> >
> > 1) What are you actually trying to achieve? It would be useful to
> > describe the parts of the current system that are giving you grief and
> > look to give you more grief based on the use cases and any axes of
> > scale.
> >
>
> Starting with what I envisioned.
>
> Who is the repository catered for?
> 1) People who would like to work on models, using it as a place to store 
> work-in-progress models.
> 2) Reviewers to review models.
> 3) Website users to browse models.
>
> 1) What do the model builders want?
> - Their own workspace (home directory)
> - A place to let reviewers review their models
> - Also to publish their models
>
> First point is not addressed by what we have now.  Second and third point is 
> quite ad-hoc.  Also, version control is very ad-hoc right now.

Each of these points need to be filled out, e.g. what does it mean to
have a workspace for a CellML modeller?, What are the scenarios and
workflows for reviewers of CellML models?

>
> 2,3) Reviewers and website users
> - A centralized location to browse models.
> - They would like to see how models may relate to each other.
>

How do models relate to each other? Relations between models come from
all sorts of data within models, and within any associated metadata
(so more than just our current cellml metadata specification). It
would be useful to write out the details of the relationships that are
important here as these pretty much form the basis of many of the
queries that will need to be performed.

> First point is already addressed, but second point is definitely not possible 
> as the current repository does not support 1.1.

Why does it not support CellML 1.1? i.e. what is the technology block
here to extending the current system to support it?

>
> Issues:
> - Flat file system.
> Sure, using ZCatalog it is possible to emulate users' home directories and 
> the like, but it still does not get away from what we have now.

I don't understand this. What are you aiming for in a home space and
why doesn't the current system support it?

> - Version/Variant
> It already clogged up the system.  There is no proper revision control 
> mechanism, what we have now is an ad-hoc emulated system.

I don't think it has clogged the system I just think it has been
improperly used both by authors and by the user interface. This is no
fault of the authors, there is simply a specification for versioning
that is missing. The hope is that subversion applies well to this.

> - It's CellML Code, right?
> Why not put code in a real code management system, like Subversion?

Subversion works well for filesystems of code and text data and to
some extent binary data that we don't really need to query the
contents of. If this applies well for CellML modelling, then
subversion is probably a good match. Subversion will bring its own
complexities when we are dealing with applying security to file
objects, and security/publishing in general will get even more complex
if we are proxying remote repositories - which we talked about a few
weeks ago.

Generally, I think the concept of cellml modelling being laid out in a
filesystem and subversion versioning concepts applied to it is good,
but untested. For instance, take a reasonably complex model of Andre's
and work out how it will look on the filesystem and  what subversion
versioning would result in.

While in this thread, I don't believe metadata should be treated any
differently to model data. Adding special rules for versioning of some
data and not others is going to complicate the versioning process and
I can't see any compelling reason to do this. Remember that the
subversion system is versioning file objects which will contain both
metadata and cellml model data. What is important is how and where
metadata is stored. Perhaps metadata should be seperated into its own
document sitting next to the model in the filesystem.

My inclination is that an implementation using subversion plus some
subversion hooks will be ok, but we haven't worked out details or done
any proof of concept for this - which should be agnositic to cellml
and focussed on how to apply zope+cmf security and workflows to data
objects stored in subversion repositories.

> - Zope has revision control
> Until someone packs the database.

Perhaps you should look at http://plone.org/products/plone/roadmap/8
(which is now completed and merged into Plone 3). There are some other
add on products - some listed in
http://plone.org/products/by-category/versioning-staging

> - Zope/Plone is also quite slow.

Really? How so?

> - Code we have now cannot get away from original design flaws.  Might as well 
> start from scratch.

Refactoring may achieve the outcome better.

>
> The major issue is, I cannot see how I can get the current repository to 
> support CellML 1.1 models.  Sure, a new archetype can be written, and built 
> with ZCatalog and the like.  I still find this method to be an ad-hoc slapped 
> together with semi-mismatching components to get it working, whereas the 
> obvious solution to use a CMS with a database that points to the data would 
> be the much elegant solution (with a front-end written to interface that).

I don't know what elegant means here. A diagram of the current
components might help for us to see what the current layout is like.
Sure, when I look at the code the lines between components are
blurred, but there is an architecture there that Carrey and Andre were
working towards. It would be useful to see the landscape as it is now.

>
> Oh, how is it ad-hoc?  I still do not have this resolved, but there is no 
> "not" query in ZCatalog.  There is a product called 'AdvancedQuery' that 
> address that, but that's more dependency on yet more products to get 
> something simple done.

What does query mean in the context of this project? Any call on the
ZCatalog is definately a query in technical terms. Is there something
like an outer join in SQL that you want an analogy for in ZCatalog
querying? Have you looked at creating custom indexes? I suspect the
ZCatalog is more powerful than you think.

>
> There are more, but I will end it here.

Please don't. We need all of them.

>
> > 2) What are the use cases? An initial set should be extracted from the
> > current site. You have written out some, but they only covered a small
> > set of function of the site, especially when it comes to relations
> > between models or workflow and curation states.
>
> Feel free to list some specific examples I have omitted like Andre and Andrew 
> did.  I do agree it is a small set, but I am starting from the basics and 
> moving up from there.  It will get quite complicated.

Document what the current workflow for the site is at the moment. Then
see where those can be cleaned up.

>
> >
> > I understand some of the details that are causing you pain with the
> > current implementation, but I think the first part of this is to be
> > charitable to the current system and adequately describe the two
> > points above.
> >
> > Before rethinking the implementation of this site I think the
> > following need to also be done:
> > - a specification for assigning a URI to these models (as would be
> > used by CellML 1.1 imports)
>
> I've outlined a few, but more details to come.
>
> > - a specification for how a manifest file is to be constructed, or
> > some set of rules for interpreting a directory structure of models,
> > especially in those cases where there are multiple local models used
> > in imports and we need to point to at least the top level model.
> > - a suggested solution to the bqs problem. Research existing standards.
> >
>
> I did consider that, and I think OpenURL may suit our needs fairly well.  It 
> is already an established standards, it's about citations, got great support 
> by the world (libraries and citation catalogs are using this), seems to have 
> everything bqs describes, and here's the spec:
>
> http://www.niso.org/standards/resources/Z39_88_2004.pdf
>
> However, it's in XML only, but near the bottom of page 23 of that file, I 
> quote:
>
> > - To support new applications, communities could introduce new XML-based 
> > ContextObject Formats constrained by other syntactic constraint languages 
> > (DTD or RELAX NG, for example) or semantic constraint languages (RDFS or 
> > OWL, for example).
>
> Nothing is really stopping us from adapting that standard, aside from having 
> to rewrite/regenerate all metadata we have now.

I'm not familiar with this (yet). We need to write out where URL has
meaning (for eample in a URI of a model import, in being able to
access versions or changesets, etc) and then weigh up the options.

>
> > Generally:
> >
> > Relational databases are useful, but so are the combination of
> > ZCatalog and Sets. It really depends on the structure of the data and
> > the queries you want to perform. You should write out a reasonable set
> > of these in natural language to get the focus right. Maybe a proof of
> > concept using various mechanisms is required.
> >
>
> Will get to that.  I am at the research stage still, but I did have some 
> preliminary schemas down.
>
> > The frustration with metadata handling at the moment is a result of
> > some difficulties in the metadata specification for the metadata you
> > are using the most and also the use of a quite esoteric system:
> > 4Suite's Versa RDF query interface. RDQL or SPARQL are better SQL-like
> > equivalents and certainly have a wide acceptance.
> >
>
> While Versa itself is not so bad, but the intricacy and gotcha's of 4Suite 
> was quite unpleasant to deal with.  I must note I did not decide to use that, 
> I merely inherited code that used it so I am sort of stuck.  If I had a 
> choice I would be using RDFlib and use SPARQL provided by it.  Yes, it has to 
> do with frustration of both 4Suite and the metadata specification.
>
> > Subversion offers a nice philosophy of code management and the guess
> > is that this would apply well to the modeling process. It also offers
> > the potential for building URIs for versioned material - individual
> > files and whole changesets (which is something we are after). The
> > default webdav URI scheme may not be what we want, so it is also worth
> > looking at others; for example, the trac browser interface to a
> > subversion repository form quite nice URIs.
> >
>
> Yes, I am doing research into those also.  The HTTP interface would built on 
> top of that also.
>
> It is my desire to use a _real_ code management backend to manage models so I 
> don't have to start writing a versioning mechanism into the repository like 
> what we have now.
>

Don't forget to look at the plone 3 state I mentioned above. But since
one of our use-cases is for someone to be able to work with data on
their filesystem and submit it through the command-line or some
file-browser tool, the subversion client process is already available
and has various attractive features.

> > Workflow and security as defined and implemented by Zope/CMF/Plone is
> > a very nice model that should be reflected in our workflow and
> > security use-cases. We discussed a few weeks ago that if this
> > environment is going to provide the security layer, then there needs
> > to be a relationship between this and the subversion repository at
> > quite a detailed level.
> >
>
> The workflow and security Zope/CMF/Plone will definitely be used, and will be 
> mapped some ways into the model repository interface (abstraction layer, I 
> called it).  I will give this more thought when I have the foundations down 
> (i.e. interface between subversion/code and the database of metadata, 
> submitted models, etc).

This feels like an issue that should be advertised on plone/zope lists
sooner rather than later; perhaps there are already some products out
there to help or other people are thinking of it, or someone has
thought of it and found compelling reasons not to do it. Either way, I
think the appeal of this subject to the greate community would be
quite high.

cheers
Matt

>
> Thank you for your thoughts,
> Tommy.
>
> > cheers
> > Matt
> >
> >
> > On 6/21/07, Tommy Yu <[EMAIL PROTECTED]> wrote:
> >> Hi,
> >>
> >> I have written down some of my thoughts on how the model repository could 
> >> be put together.
> >>
> >> http://www.cellml.org/Members/tommy/repository_redesign.html
> >>
> >> It is still a pretty rough document.  The usage example section gives a 
> >> rough outline on what I see people might be doing with the repository and 
> >> how this design could address those issues, which I think it will be of 
> >> interest to users.  It is not an exhaustive list, yet.
> >>
> >> I must also note the design outlined is quite a drastic departure from 
> >> what we have now (it will be yet another new repository).  However, it is 
> >> more true to the one envisioned before according to 
> >> http://www.cellml.org/wiki/CellMLModelRepositories, except I have an 
> >> addition layer that will assist in pulling content and drawing 
> >> relationships between models.
> >>
> >> Feel free to take it apart and/or build on top of it.
> >>
> >> Cheers,
> >> Tommy.
> >> _______________________________________________
> >> cellml-discussion mailing list
> >> [email protected]
> >> http://www.cellml.org/mailman/listinfo/cellml-discussion
> >>
> > _______________________________________________
> > cellml-discussion mailing list
> > [email protected]
> > http://www.cellml.org/mailman/listinfo/cellml-discussion
>
> _______________________________________________
> cellml-discussion mailing list
> [email protected]
> http://www.cellml.org/mailman/listinfo/cellml-discussion
>
_______________________________________________
cellml-discussion mailing list
[email protected]
http://www.cellml.org/mailman/listinfo/cellml-discussion

Re: [cellml-discussion] Concerning the CellML Model Repository

Reply via email to