I have often thought referencing external code through a clearly defined interface would be useful, and mostly because procedural code is another natural way to solve problems. But I have always banged my head up against validation. With procedural code this amounts to passing tests - good tests - and being confident that the code will break in useful ways when it does break. I don't see this as being any different to the intended outcome of valid CellML models that are purely declarative.
At first glance it might seem that it is more taxing for a developer wanting to use CellML in their application if they need to handle external code; but this proposal for external code is very specific to the math declarations, and I think independent of whether the math is represented in MathML or as an external source of procedural code, the decisions of an application that are investigating the math are going to be difficult without sufficient annotation that tries to classify the math formulations in a way that a machine can filter what it is capable of and not capable of processing. In some cases I imagine the application developer would welcome a particular math problem being already coded in a language that could be compiled an run. If that thought is continued, then there is a place for a model representation that has all math represented by external code, with the model structure being represented in CellML. This would obviously be under the assumption that some particular decisions for simulation of the model had been made; it is indeed a different scenario from the pure declarative model that seeks to explain the mathematical problem at a higher level and leave it to applications to resolve the simulation from this. At the moment we don't actually have a useful way for providing a cellml model with enough machine readable information for someone to rerun our model in exactly the same we as we had. By referencing and/or including external code, we allow the step of exchanging a model at the simulation level, which is actually not a bad thing if our goal is to promote collaboration of model building. I do think there is a possibility that people would abuse this; i.e. jump straight to binding bits of code here and there together with CellML; but if we maintain standards and best practice, then it should be easy to show them up. Also, perhaps we should trust people to evolve to only resorting to external code if it absolutely is the best way to solve their problem. There are a couple of things that we could possibly lose by bringing in external code: 1) producing human readable equations for publication that accurately reflect the mathematics in the model. Annotation of the algorithms or maths in the external code would help, but would not guarantee that the publication reflected exactly what was encoded in the model. 2) ease of creating machine readable annotation for parts of the external code that would require it - for example under MIRIAM to bind each 'component' of a model to the relevant part of a reaction network. This is where you would be questioning the modeler as to whether their external code should be broken down and spread across models. But they may not have control over the external source, or, perhaps they are exchanging models that have necessarily lumped a lot of biological concepts into one piece of external code(library) because it's more efficient to solve that way; we now have a non MIRIAM compliant model. I would like to think including linking to external code in the CellML specification would push us to make a bigger effort on the procedures for model validation, and get more encouraging involvement of various modelers sitting out there with code that works; rather than thinking we will somewhere lose some high level elegance of CellML. Specific comments (quoted pieces from http://www.cellml.org/Members/miller/bcp-external-models/ are enclosed in triple quotes) """[CellML] models are very good at describing complete mathematical models in a format which can be exchanged between model authors and users. This adds significant value to a model representation, because third parties can take the model, and use it in their preferred software packages to reproduce any results the author published.""" Need some clear examples of model types that cannot be expressed in CellML, i.e. some algorithms that are best (or only) expressible at the moment in procedural code. I know that various neural network models and genetic algorithm based learning systems have evolved mainly from procedural thought. I think we need to really consider that some problems would be much better understood by model authors if they are expressed in procedural code. """Having part of a model expressed in CellML, and other parts expressed in some more generic language is still useful, because it means that the common part of the model can be re-used more easily, either by providing external code of a different kind, or, where possible by replacing the external code with MathML.""" If external code can be replaced with MathML, then why wouldn't this have been in a CellML component in the first place? I see a pro and con where someone encodes most of a model in an external code block bound into a single component of a model. The pro would be that maybe this has helped promote someone actually bothering to use cellml - as a first step, they simply wrapped their existing code; in this case it would be up to repository maintainers to encourage a breakdown of the model. The con of course is that we lose model structure into the external code, and there is no way we can automatically extract that. It is therefore effectively hidden until broken down - if that ever makes sense for the model. """It is also hoped that this specification will encourage model developers to build up libraries of CellML accessible external code, which can be re-used in a range of CellML models, therefore increasing the range of modelling techniques available to CellML model authors.""" I would see an open library of external code being very useful. There would need to be clear grading of that code, for example validating that code even compiles(if it needs to) and run on x,y,z platforms. """Best practice guidelines for CellML document authors""" """1. External code should be used only where a part of a model cannot be adequately expressed in CellML. External code is often non-portable, and using it reduces the re-usability of your model, and so it should only be used when needed.""" yes """2. External code should only perform the calculations that CellML is unable to perform, with the rest of the calculations expressed as MathML, in the CellML model. This is important, because increasings the fraction of your model can be more easily re-used by other modellers. It also means that CellML editing and visualisation software will allow your model to be edited and visualised better.""" yes and no. I don't think representing in MathML offers any more ease for re-use unless you are all sharing a prescribed subset of MathML and agree on the acceptable forms of equations if algebraic manipulation is limited or not possible. """3. Modellers should, where feasible, separate external code into as many different sub-functions as possible. For example, if you have external code to compute y1 from x1 and x2, and y2 from x1 and x2, you should write this as two separate external function applications, unless there is a compelling reason to do otherwise (such as is the case if it is much more efficient to compute them together). Doing this makes it easier to modify the CellML model in the future, and allows the CellML processing software to determine the order in which expressions are evaluated, making your model more flexible.""" see above ... the compromise will always be the amount of information you can extract out of the model for other purposes - for example for model reuse, for simply visualizing and understanding the makeup of the model, for publication. It could be compelling enough for people to produce at least one highly broken down model along with the one fitted for optimization. """4. External code should, by itself, meet [MIRIAM] requirements 1 and 2. This means that the external code should be encoded in a public, machine-readable format, and it should be valid and compilable.""" It should meet all the criteria of MIRIAM compliance as part of being a model on the whole. The test cases are going to be very important I think in assuring the quality of external code. You might make the case the external code is wrapped in its own model which itself would need to be fully MIRIAM compliant. The MIRAM document is a bit weak around the edges of things like validation and the annotation of 'components' of a model. I think we need to be clear about what validation is necessary for models that reference external code. I would still like more clarification of how important MIRIAM is to this; especially in that I think the requirements of MIRIAM haven't really been designed with typical procedural code examples in mind. I don't think MIRIAM can't cope with it. """5. The external code should be treated as part of the model. When a model represented in CellML is published, the external code should be published alongside it, unless it is part of a generally available library of external code.""" The latter part worries me a little. Enter license bewilderment. But see 6. """6. The definitionURL used on csymbol elements should be a URL under the control of the author. It is not necessary for there to actually be a document accessible at the URL, as it is merely intended as a unique identifier.""" What happens with multiple authors? Will an author always guarantee a method for creating a URL? I think this problem is related to 5. For example, if the source code for an external component is submitted to a repository and becomes licensed according to that, then the URL should probably be related to that. So I think ultimately the domain that wants to guarantee that the source is perpetually available should be the domain that forms the base of the URL. cheers Matt On 3/18/07, Andrew Miller <[EMAIL PROTECTED]> wrote: > David Nickerson wrote: > >> ECMAScript is not practical for use in modelling, because it is an > >> interpreted, non-typed language, which necessarily means that it cannot > >> be compiled and will be slower than compiled code. > >> > > > > But CellML is an language for the description and exchange of > > mathematical models. It is not meant to be a one-off wonder describing > > the most efficient and best performing method for executing numerical > > computations. > > > > To turn a CellML model description into something useful for computation > > that description has to be interpreted and compiled into some other > > format suitable for the environment using it... > > > > Surely in the same manner, a standard description of procedural code > > could then be interpreted by any number of applications in whatever > > manner they feel best suits their environment? > > > No, because due to the restriction of CellML to expression, it is much > easier to work with, and this is what makes it declarative. You can > perform a variety of manipulations on declarative expressions, but > procedural code can basically only be run in the way it was written to > run (for example, even working out whether procedural code will ever > terminate, 'The Halting Problem', has been proved to be non-Turing > computable in the general case, and this is likely to be the case for > other types of manipulations too). > > Code can often be optimised and compiled, but the features of ECMAScript > preclude many of the optimisations that a C compiler, for example, can make. > > For example, objects can have arbitrary properties, and there is no way > to tell at compile-time what set of properties an object will have, or > whether a property is a simple property or a getter. While a C compiler > might take a value from an offset into a structure, ECMAScript code > would end up searching a dictionary of properties on an object. > Therefore, ECMAScript is not a good language if you want to be able to > interpret it in different ways (and for any Turing-complete language, > the ways in which you can interpret it are severely limited). > > Remember also CellML models can be used to solve a range of different > problem types (fitting, ODE time course, and so on), but one procedural > code implementation might not be useful for all of them. > > My BCP document is intended as a way to maintain as much of the model as > possible in CellML, but simply leave the rest of the model unspecified. > Given the amount of history and development of procedural languages, I > don't think we can hope to 'standardise' anything more in a widely > acceptable way when it comes to procedural languages. > > > >> External code needs to be extensible, and hence outside the scope of the > >> CellML specifications, for several reasons: > >> 1) Performance. Code may need to be written in a way which is specific > >> to a particular platform in order to be able to perform well. > >> > > > > some response as above. > > > Sometime, human intervention is always going to be required to save a > model from unfeasible performance issues. If we take an ideological > approach and try to block this from happening, it will just result in > CellML not being used at all. Instead, it is better to encourage people > to use CellML features whenever possible, but allow external code when > it is not possible. > > > >> 2) Access to existing libraries. There are often extensive libraries and > >> other software packages into which a model needs to be integrated. This > >> could be in practically any language, and so it would be necessary to > >> access to data structures of these libraries to have the model work. I > >> believe that this is the case for much of the CMISS-CellML work (I don't > >> really think that a proposal to re-write CMISS in ECMAScript would be > >> very popular!). > >> > > > > In every case of people using CMISS that I know of, the use of CellML is > > to define model specific mathematical equations for integration into a > > larger model. > In other words, the model consists of parts which can be expressed as > mathematical equations, and parts that cannot be expressed in > mathematical equations (in CMISS). You are proposing that the parts > which cannot be expressed in mathematical equations be written in > ECMAScript. > > I'm not suggesting re-writing CMISS in ECMAScript - rather > > you seem to be suggesting including CMISS in a CellML model?!? > > > The question of which model is included in which is more an artificial > distinction than anything more meaningful. However, there needs to be a > mechanism for data flow from CMISS into the CellML models (otherwise, > CMISS can only set initial conditions, it can't have any time dependent > influence on the model). > > This would hold for most such cases of using existing libraries that I > > can think of, with the exception of someone wanting to solve a > > particular equation or set of equations in a model using a very specific > > numerical method that their CellML simulation tool does not support. > > > There are many other computations that are better done by procedural > code than by systems of ODEs. Machine learning algorithm lookups are one > example of this, and there are extensive libraries of these sorts of > things available. > > Even if you take a step back and look at the larger picture of using > > things like FieldML, CellML, MathModelML (or something), etc... to > > describe something like an electrical propagation model in the heart, > > the tool (eg, CMISS) pulls it all together and plugs fields and > > variables together based on the model annotations. Otherwise you'll end > > up with cell models that say things like "give me the current load at > > this point in space by solving the bidomain model over this geometric > > domain" - making the cell model description useless for any other > > application. What you rather want is a simply a variable in the model > > which is the current load that has an interface of in. Your cell model > > integrator doesn't care where this value comes from, it just knows that > > when the tool calls for the cell model to be integrated that it will > > provide some appropriate value. > > > I firstly note that if you are talking about using component-level > interfaces for this, that is not a feasible approach. I include an > e-mail I sent to Shane and Poul about this earlier below: > > " > Shane has proposed that as an alternative to using content MathML to > reference external code, we could use components. However, this appears > to be inconsistent with the way CellML works at the moment, so I don't > think that it could form the basis for defining external functions. > > The problem with the approach of defining external components is that > the directionality of variable interfaces in CellML is too weak to > define the actual directionality and order in which mathematics is > evaluated. > > This is a good thing, for two reasons: > 1) CellML is inherently declarative, not procedural. This means that if > you give an equation defining x in terms of a, b, and c, but due to the > other components in the model, x, a, and b are known, and it becomes > necessary to obtain c, it is perfectly valid for the CellML software to > perform a Newton-Raphson solve (or algebraic manipulations, if it has > the capability) to obtain c. However, if the directionality on > components was strong, CellML processing software would be constrained > to compute components in a certain way, which would in turn limit the > flexibility of each component. > > 2) It is possible to have more than one mathematical equation in a > single component, and in some cases these might be completely > independent. For example, you might have, in one component: > > w = x + a > y = z + b > > and in another: > > z = w + c > > With x the bound variable of integration, and a, b, and c being constant. > > This might make sense, because components are generally used to > represent entities in biology, rather than the actual directionality of > mathematical equations. However, it means that you evaluate part of the > first component, then part of the second, and then go back to part of > the first component. This is something you couldn't do if each component > was an external block. > > Given that we don't have a one equation per component system, it is also > possible that you want to combine mathematics in MathML with the > external code (perhaps to re-parameterise the function, or something > like that). > > Because of this, I am still convinced that defining external operators > using MathML is a better approach than trying to overload the component > system in CellML for a use other than what it was originally intended. > " > > Secondly, the "cell model" integrator does need to care where the values > come from, because it is responsible for moving from one time point to > the next, and to do this, it needs to know what values from the current > time point are needed to compute which other values at the current time > point. This is why I have defined an interface which, in a very MathML > natural way, describes the inputs and outputs of the external code, > which is essentially equivalent to what you are talking about above, > except the inputs to the external code must be provided as well. > > > > >> 3) Access to specialised hardware. A model could potentially even > >> require that a function is evaluated by some sort of online experimental > >> procedure (perhaps automated probing of a hardware model) for a given > >> set of inputs. > >> > > > > Again, this seems more like a case where you define a mathematical model > > which given some input(s) produces some output(s). The controlling > > software would take the mathematical model definition in CellML and > > connect the appropriate inputs and outputs. > This is exactly why we need a way to describe inputs and outputs, which > is what I describe in the proposal. > > I would really need a > > concrete example of why you would want to describe a mathematical model > > in CellML which requires input from specialised hardware. Surely you > > just define a variable that has an interface of in and annotate it such > > that the controlling software can find it and plug in the appropriate > > value(s)? > > > > > >> 4) Multiple standards, with different communities who favour them. It > >> would not be practical to get everyone involved with CellML to agree on > >> a certain procedural programming language (even deciding on Fortran vs > >> C++ etc... has been a challenge at this institute, and will probably be > >> impossible for the wider CellML community). > >> > > > > As above, you are not performing computations using CellML directly - > > you always turn the model description into something suitable for the > > computational environment in which the model is being used. Thats the > > beauty of CellML - you can turn it into Fortran or C++, depending on > > your personal preference! > > > For CellML, it is irrelevant what language it is translated through, > because it can't call external code anyway. But if we call external > code, that external code can further call other external code. Also, > CellML filled a new niche, while you seem to propose that we tell > everyone which language to use, which is a contentious issue. Also note > that you cannot turn ECMAScript into efficient C++ in general. > > CellML is all about being able to exchange a standard description of a > > mathematical model between potentially very different software > > environments. The whole idea is specifically not specifying the best way > > to compute outputs from the model - which seems to be what you are > > driving at....and the best way to compute outputs from a model is always > > going to be dependent on the target computational environment. > > > Which is why we keep the things that CellML can do well in CellML, while > continuing to not specify how the things that CellML can't do well. That > is why my proposal only provides details of the interface to the > external code, and doesn't try to specify the external code itself. > > > >> As an example, consider my PhD project, where I plan to put machine > >> learning components into CellML models: > >> 1) Performance is likely to be important. If it is too slow, it might > >> not be feasible to do at all. > >> 2) I plan to use existing libraries, in a range of different languages. > >> 3) I also have another (perhaps not as common) gain from specifying the > >> external functions without describing their details: I need to run > >> different code in 'training' and 'simulation' modes, and if I just wrote > >> generic ECMAscript for the simulation case, there would be no simple way > >> to deduce the training case. Because of this, it is probably good to > >> keep the non-algebraic parts of the model completely separate, and leave > >> it up to whoever implements the specific CellML processor. > >> > > > > I'd probably need to see more detailed plans on exactly what you are > > planning on doing before commenting on this. But from what I have seen, > > whenever anyone has wanted to include procedural code directly in a > > CellML model it has always turned out that they are approaching the > > problem from the wrong direction. > > > > Just to re-iterate, CellML is all about exchanging *descriptions* of > > mathematical models - not implementations of computational code. > > > Which argues for specifying how to interface external procedural code, > as in my original proposal, rather than specifying how to exchange the > procedural code, as you have suggested. > > > >> That said, I think we could have multiple levels of degeneracy away from > >> standardised code, where you only go down to the next item if the > >> current one is impossible: > >> 1) Pure CellML. > >> > > > > definitely. > > > > > >> 2) CellML with standardised Turing-complete code support. > >> > > > > I can see why we should provide a mechanism for this, but have yet to > > see an example where it would be useful (other than to get around a > > particular tool's deficiencies). > > > > > >> 3) CellML with external (non-standardised) code. > >> > > > > I still haven't seen a reason why this would ever be required? > > > > > > David. > > > > > > Best regards, > Andrew > > _______________________________________________ > cellml-discussion mailing list > [email protected] > http://www.cellml.org/mailman/listinfo/cellml-discussion > _______________________________________________ cellml-discussion mailing list [email protected] http://www.cellml.org/mailman/listinfo/cellml-discussion
