Hi Tommy, Can you continue to update/fill out your document as well as begin associated proposals with information contained in the replies people are submitting. The goal of this process is a scoping document with associated content.
More comments below. On 6/22/07, Tommy Yu <[EMAIL PROTECTED]> wrote: > Matt wrote: > > Hi Tommy, > > > > I found the document seemed to be too far ahead of itself. I also > > didn't find any of the pros and cons very compelling because they > > don't address specific problems and those problems are not described. > > > > 1) What are you actually trying to achieve? It would be useful to > > describe the parts of the current system that are giving you grief and > > look to give you more grief based on the use cases and any axes of > > scale. > > > > Starting with what I envisioned. > > Who is the repository catered for? > 1) People who would like to work on models, using it as a place to store > work-in-progress models. > 2) Reviewers to review models. > 3) Website users to browse models. > > 1) What do the model builders want? > - Their own workspace (home directory) > - A place to let reviewers review their models > - Also to publish their models > > First point is not addressed by what we have now. Second and third point is > quite ad-hoc. Also, version control is very ad-hoc right now. Each of these points need to be filled out, e.g. what does it mean to have a workspace for a CellML modeller?, What are the scenarios and workflows for reviewers of CellML models? > > 2,3) Reviewers and website users > - A centralized location to browse models. > - They would like to see how models may relate to each other. > How do models relate to each other? Relations between models come from all sorts of data within models, and within any associated metadata (so more than just our current cellml metadata specification). It would be useful to write out the details of the relationships that are important here as these pretty much form the basis of many of the queries that will need to be performed. > First point is already addressed, but second point is definitely not possible > as the current repository does not support 1.1. Why does it not support CellML 1.1? i.e. what is the technology block here to extending the current system to support it? > > Issues: > - Flat file system. > Sure, using ZCatalog it is possible to emulate users' home directories and > the like, but it still does not get away from what we have now. I don't understand this. What are you aiming for in a home space and why doesn't the current system support it? > - Version/Variant > It already clogged up the system. There is no proper revision control > mechanism, what we have now is an ad-hoc emulated system. I don't think it has clogged the system I just think it has been improperly used both by authors and by the user interface. This is no fault of the authors, there is simply a specification for versioning that is missing. The hope is that subversion applies well to this. > - It's CellML Code, right? > Why not put code in a real code management system, like Subversion? Subversion works well for filesystems of code and text data and to some extent binary data that we don't really need to query the contents of. If this applies well for CellML modelling, then subversion is probably a good match. Subversion will bring its own complexities when we are dealing with applying security to file objects, and security/publishing in general will get even more complex if we are proxying remote repositories - which we talked about a few weeks ago. Generally, I think the concept of cellml modelling being laid out in a filesystem and subversion versioning concepts applied to it is good, but untested. For instance, take a reasonably complex model of Andre's and work out how it will look on the filesystem and what subversion versioning would result in. While in this thread, I don't believe metadata should be treated any differently to model data. Adding special rules for versioning of some data and not others is going to complicate the versioning process and I can't see any compelling reason to do this. Remember that the subversion system is versioning file objects which will contain both metadata and cellml model data. What is important is how and where metadata is stored. Perhaps metadata should be seperated into its own document sitting next to the model in the filesystem. My inclination is that an implementation using subversion plus some subversion hooks will be ok, but we haven't worked out details or done any proof of concept for this - which should be agnositic to cellml and focussed on how to apply zope+cmf security and workflows to data objects stored in subversion repositories. > - Zope has revision control > Until someone packs the database. Perhaps you should look at http://plone.org/products/plone/roadmap/8 (which is now completed and merged into Plone 3). There are some other add on products - some listed in http://plone.org/products/by-category/versioning-staging > - Zope/Plone is also quite slow. Really? How so? > - Code we have now cannot get away from original design flaws. Might as well > start from scratch. Refactoring may achieve the outcome better. > > The major issue is, I cannot see how I can get the current repository to > support CellML 1.1 models. Sure, a new archetype can be written, and built > with ZCatalog and the like. I still find this method to be an ad-hoc slapped > together with semi-mismatching components to get it working, whereas the > obvious solution to use a CMS with a database that points to the data would > be the much elegant solution (with a front-end written to interface that). I don't know what elegant means here. A diagram of the current components might help for us to see what the current layout is like. Sure, when I look at the code the lines between components are blurred, but there is an architecture there that Carrey and Andre were working towards. It would be useful to see the landscape as it is now. > > Oh, how is it ad-hoc? I still do not have this resolved, but there is no > "not" query in ZCatalog. There is a product called 'AdvancedQuery' that > address that, but that's more dependency on yet more products to get > something simple done. What does query mean in the context of this project? Any call on the ZCatalog is definately a query in technical terms. Is there something like an outer join in SQL that you want an analogy for in ZCatalog querying? Have you looked at creating custom indexes? I suspect the ZCatalog is more powerful than you think. > > There are more, but I will end it here. Please don't. We need all of them. > > > 2) What are the use cases? An initial set should be extracted from the > > current site. You have written out some, but they only covered a small > > set of function of the site, especially when it comes to relations > > between models or workflow and curation states. > > Feel free to list some specific examples I have omitted like Andre and Andrew > did. I do agree it is a small set, but I am starting from the basics and > moving up from there. It will get quite complicated. Document what the current workflow for the site is at the moment. Then see where those can be cleaned up. > > > > > I understand some of the details that are causing you pain with the > > current implementation, but I think the first part of this is to be > > charitable to the current system and adequately describe the two > > points above. > > > > Before rethinking the implementation of this site I think the > > following need to also be done: > > - a specification for assigning a URI to these models (as would be > > used by CellML 1.1 imports) > > I've outlined a few, but more details to come. > > > - a specification for how a manifest file is to be constructed, or > > some set of rules for interpreting a directory structure of models, > > especially in those cases where there are multiple local models used > > in imports and we need to point to at least the top level model. > > - a suggested solution to the bqs problem. Research existing standards. > > > > I did consider that, and I think OpenURL may suit our needs fairly well. It > is already an established standards, it's about citations, got great support > by the world (libraries and citation catalogs are using this), seems to have > everything bqs describes, and here's the spec: > > http://www.niso.org/standards/resources/Z39_88_2004.pdf > > However, it's in XML only, but near the bottom of page 23 of that file, I > quote: > > > - To support new applications, communities could introduce new XML-based > > ContextObject Formats constrained by other syntactic constraint languages > > (DTD or RELAX NG, for example) or semantic constraint languages (RDFS or > > OWL, for example). > > Nothing is really stopping us from adapting that standard, aside from having > to rewrite/regenerate all metadata we have now. I'm not familiar with this (yet). We need to write out where URL has meaning (for eample in a URI of a model import, in being able to access versions or changesets, etc) and then weigh up the options. > > > Generally: > > > > Relational databases are useful, but so are the combination of > > ZCatalog and Sets. It really depends on the structure of the data and > > the queries you want to perform. You should write out a reasonable set > > of these in natural language to get the focus right. Maybe a proof of > > concept using various mechanisms is required. > > > > Will get to that. I am at the research stage still, but I did have some > preliminary schemas down. > > > The frustration with metadata handling at the moment is a result of > > some difficulties in the metadata specification for the metadata you > > are using the most and also the use of a quite esoteric system: > > 4Suite's Versa RDF query interface. RDQL or SPARQL are better SQL-like > > equivalents and certainly have a wide acceptance. > > > > While Versa itself is not so bad, but the intricacy and gotcha's of 4Suite > was quite unpleasant to deal with. I must note I did not decide to use that, > I merely inherited code that used it so I am sort of stuck. If I had a > choice I would be using RDFlib and use SPARQL provided by it. Yes, it has to > do with frustration of both 4Suite and the metadata specification. > > > Subversion offers a nice philosophy of code management and the guess > > is that this would apply well to the modeling process. It also offers > > the potential for building URIs for versioned material - individual > > files and whole changesets (which is something we are after). The > > default webdav URI scheme may not be what we want, so it is also worth > > looking at others; for example, the trac browser interface to a > > subversion repository form quite nice URIs. > > > > Yes, I am doing research into those also. The HTTP interface would built on > top of that also. > > It is my desire to use a _real_ code management backend to manage models so I > don't have to start writing a versioning mechanism into the repository like > what we have now. > Don't forget to look at the plone 3 state I mentioned above. But since one of our use-cases is for someone to be able to work with data on their filesystem and submit it through the command-line or some file-browser tool, the subversion client process is already available and has various attractive features. > > Workflow and security as defined and implemented by Zope/CMF/Plone is > > a very nice model that should be reflected in our workflow and > > security use-cases. We discussed a few weeks ago that if this > > environment is going to provide the security layer, then there needs > > to be a relationship between this and the subversion repository at > > quite a detailed level. > > > > The workflow and security Zope/CMF/Plone will definitely be used, and will be > mapped some ways into the model repository interface (abstraction layer, I > called it). I will give this more thought when I have the foundations down > (i.e. interface between subversion/code and the database of metadata, > submitted models, etc). This feels like an issue that should be advertised on plone/zope lists sooner rather than later; perhaps there are already some products out there to help or other people are thinking of it, or someone has thought of it and found compelling reasons not to do it. Either way, I think the appeal of this subject to the greate community would be quite high. cheers Matt > > Thank you for your thoughts, > Tommy. > > > cheers > > Matt > > > > > > On 6/21/07, Tommy Yu <[EMAIL PROTECTED]> wrote: > >> Hi, > >> > >> I have written down some of my thoughts on how the model repository could > >> be put together. > >> > >> http://www.cellml.org/Members/tommy/repository_redesign.html > >> > >> It is still a pretty rough document. The usage example section gives a > >> rough outline on what I see people might be doing with the repository and > >> how this design could address those issues, which I think it will be of > >> interest to users. It is not an exhaustive list, yet. > >> > >> I must also note the design outlined is quite a drastic departure from > >> what we have now (it will be yet another new repository). However, it is > >> more true to the one envisioned before according to > >> http://www.cellml.org/wiki/CellMLModelRepositories, except I have an > >> addition layer that will assist in pulling content and drawing > >> relationships between models. > >> > >> Feel free to take it apart and/or build on top of it. > >> > >> Cheers, > >> Tommy. > >> _______________________________________________ > >> cellml-discussion mailing list > >> [email protected] > >> http://www.cellml.org/mailman/listinfo/cellml-discussion > >> > > _______________________________________________ > > cellml-discussion mailing list > > [email protected] > > http://www.cellml.org/mailman/listinfo/cellml-discussion > > _______________________________________________ > cellml-discussion mailing list > [email protected] > http://www.cellml.org/mailman/listinfo/cellml-discussion > _______________________________________________ cellml-discussion mailing list [email protected] http://www.cellml.org/mailman/listinfo/cellml-discussion
