-------- Original Message -------- Subject: Re: Questions about the CCGS, and 3 possible bugs Date: Wed, 25 Oct 2006 10:50:55 +1300 From: Andrew Miller <[EMAIL PROTECTED]> To: Jonathan Cooper <[EMAIL PROTECTED]> References: <[EMAIL PROTECTED]>
Jonathan Cooper wrote: > Hi Andrew, > > You may be aware that a group of people are writing a review article > on CellML and associated tools, for a special issue of Progress in > Biophysics and Molecular Biology. I've been put in charge of the > section on code generators, so have been looking at the CCGS to see > what I can write about it. > > Firstly, I've found a few things that may be bugs. The first is that > whatever model I generate code for, the size of the RATES array is > declared to be 1, despite there clearly being more than 1 rate > variable (e.g. for the Beard 2005 model there are 19, but the header > still has RATES[1]). This turned out to be a bug in the Plone product (which provides a web-interface to CCGS). I have made a fix for this, it just needs to be deployed onto the website (I will speak to our webmaster about getting this done). > > The second is that a couple of models give a UnicodeDecodeError with > "'ascii' codec can't decode byte 0xc3 in position 8548: ordinal not in > range(128)". The Hodgkin-Huxley model in the repository is one such. > Skimming through the CellML source, I can't spot any obvious > occurrences of wierd characters, so do you have any idea what might > give rise to this? It could be useful to work around such errors, in > any case. I tried this with the command-line version of CCGS, and it works okay. I don't think this is a CCGS bug, but rather an issue with the Plone product again (it is trying to convert from ASCII to UTF-16, when we should be doing UTF-8 to UTF-16, because the CCGS works with UTF-16 strings). I will look into what can be done about this. > > The third is the handling of division. As I understand it, all > numbers in CellML models are implicitly double valued. However, for > the XML fragment > <apply><eq/> > <ci>V</ci> > <apply><divide/> > <cn cellml:units="volt">1</cn> > <cn cellml:units="dimensionless">2</cn> > </apply> > </apply> > the CCGS generates > VARIABLES[0] = (1/2); I have now fixed the CCGS to always put decimal points on values taken from cn attributes, and likewise for unit conversion factors / offsets. > which will perform an integer division (in C). > > > Now for some more general comments. > > Is the source code for the CCGS available anywhere? (Is it open > source?) CCGS is tri-licensed under the GNU General Public License, Mozilla Public License, and GNU Lesser General Public License. It is essentially an optional extension module to the CellML API, and so is shipped with it. The SVN repository is still not yet public (our IT group are working on it, and apparently making progress). However, snapshots of PCEnv and all dependencies are now being regularly put on FTP (PCEnv uses it, so its source code gets automatically released whenever a Linux binary is made). You can download the CellML API snapshots from ftp://ftp.bioeng.auckland.ac.nz/pub/physiome/cellml_api/snapshots/source (the snapshot process is automated, but manually initiated at the moment, but you can e-mail me and I can run the script). When you get the CellML API, look at interfaces/CCGS.idl and if you want, the files in CCGS/sources. > What language is it written in? (The UnicodeDecodeError is > Python, but I seem to recall the CCGS is in C.) The generated code is C, but the generator is in C++, like the rest of the CellML API. The Python error you are seeing is from the CellML repository Plone product, which is in Python, but accesses the CellML repository across CORBA. > Is there any documentation on the algorithms it uses? Aside from the source code, and any comments in the code, no. > My thinking for the structure of the code generation section is to > start with some motivation - interpreting XML is slow, and people want > to plug models into their existing simulation software. You should probably also mention that CellML is a declarative language, not a procedural one (I think this is often misunderstood by people new to CellML), and so it states what the relationships between the variables are, rather than a direct process for computing variables. Whether or not you are interpreting XML, generating 'source' code of some form, or directly generating machine code, at some point you need to translate from a declarative view into a procedural view, and this is the key role that these frameworks play. > Then I'll > discuss common features of the various code generators - they all view > a model as an ODE system, for example (in fact, I think they all only > treat initial value problems). This part will probably need to > include some comments on how variables are classified (as > constant/computed/rate/etc.). The framework generalises to support initial value problems, but you could still use it to compute expressions in terms of initial conditions. > > Then I'll consider features of each tool, asking questions such as > Which languages (and simulation software) do the tools target? Currently CCGS targets C (hence why it is called C Code Generation Service), although there are future plans to split the common parts out and write generation services for other languages. > What assumptions do they make about the input model? There are a few assumptions the current code makes (although these may be relaxed in the future): 1) Equations involving differentials must have the differential by itself on one side of the equals sign. 2) Every variable is assumed to be real valued (no complex numbers, vectors, matrices, etc...). 3) Set logic and propositional calculus is not supported, aside from the logical operators (e.g. you can't define a summation or integral over a set). However, you can define complex logical expressions using and, or, and the other binary/unary logical operators as the condition of a piecewise equation. > Is there any flexibility in output format? CCGS is an API, not a program targeting end-users, and the API provides general information about the code, as well as 'fragments' of code (which compute certain parts of the model). The fragments contain a block of equations, and have variables which may be renamed through the use of C preprocessor macros. However, specific programs built on top of the API, such as the Physiome Model Repository, do not currently provide much flexibility in the output. > Can they easily be modified to change the output format? It is fairly simple to write a different program which uses the CCGS API to generate code in your desired format. > Are there any special points to note? There have been several successful uses of CCGS: 1) The CellML Integration Service (CIS), which also comes with the CellML DOM API, uses the GNU Scientific Library and the CCGS to run simulations. It is used by PCEnv to allow people to run simulations. 2) David Nickerson has used the CCGS (API) to generate code suitable for use with SUNDIALS. 3) CCGS comes with a test command-line program, called CellML2C, which calls the CCGS API to generate C code. 4) CCGS is used by the Physiome Model Repository Plone product to add a 'Procedural Code' tab to models in the CellML model repository. There is also a separate plone product, called CCGSPlone, which allows users to upload their own private CellML models over HTTP, and get procedural code back. It is worth noting that although CCGS supports definite integrals and Newton-Raphson solves of equations which CCGS can't use directly, no one has yet used this functionality (it is planned for the near future, so may be out before your paper gets published. mozCellML used to support these two features using its own code generator). > > Finally I'll look at optimisation, and my own tools for this. CCGS performs a minor optimisation, because equations are separated into those which need to be run once, and those which must be run after every time step, to avoid unnecessary recomputation. It relies on the C compiler to perform constant folding optimisations. It cannot optimise too aggresively, because it is designed to allow initial values and parameters (other than those set through cn elements) to be changed without having to recompile. Best regards, Andrew PS: Do you mind if I also send this to the CellML Discussion list, as my answers are likely useful to other people as well. _______________________________________________ cellml-discussion mailing list [email protected] http://www.cellml.org/mailman/listinfo/cellml-discussion
