[cellml-discussion] Describing rules for translating expressions into arbitrary languages

Andrew Miller Sun, 29 Apr 2007 15:50:51 -0700

Hi,

I am looking at refactoring the CCGS into several components, as 
discussed in an earlier e-mail. As part of this, I am looking at how I 
can separate out the language specific parts of code generation. At this 
stage, I am focusing on how expressions get generated, rather than 
entire assignments. This will then be combined with code to generate the 
procedural steps required to evaluate a model. Writing a program to 
generate code for a new language will then be as simple as iterating 
through the procedural steps, writing out assignments of expressions 
into variables, in addition to supplying all the language specific glue 
to the integrator.


I have defined a file format specification, called MAL (or 
MathML-language mapping) designed to contain all the information needed 
to generate expressions for a specific programming language. I would 
welcome any feedback anyone may have on the specification. I would be 
particularly interested in hearing if you can think of some extension to 
the language which is needed to support generation for a certain 
language. The specification follows...

MAL Format is intended as a succinct but complete description of how to
translate expressions from MathML into the syntax of another programming
language. It is intended to be both simpler but more powerful (within the
problem domain it is trying to address) than more generic approaches such as
XSLT.

Format:
The format consists of a series of tags. Each tag has a series of 
alphanumeric
characters(the tag name), followed by a collon and a space (": "), 
followed by a
series of characters (the tag value). The tag is terminated by a 
carriage return
or line-feed character, and the next tag starts at the first character which
isn't a carriage return or line feed.

Where line-length formatting transforms (such as for FORTRAN 77), a
post-processing stage must be used to achieve this. The reason for this 
design
decision is that expressions alone do not determine line length.

The following tags are defined:

Name: opengroup
Value: A string which can be appended before another string to force that
  string to have the highest precedence.
Examples:
  opengroup: (
  Sets the open group string to be (, which is the open group character in
  languages like C.

Name: closegroup
Value: A string which can be appended after another string to force that
  string to have the highest precedence.
Examples:
  closegroup: )
  Sets the close group string to be ), which is the close group character in
  languages like C.

Name: The name of any MathML operator.
Value: A string describing the format. This string shall start with a
  description of operator precedence in the target language, and then 
describe
  a pattern for generating the target language expression.

  A precedence description is specified between #prec[ and ]. The following
  precedence descriptions can be used:

  #prec[n(m)] where n and m are integers between 0 and 1000. Sets the outer
  precedence to n (this is a precedence score for the resulting expression),
  and the inner precedence to m (this is a precedence score below which
  operands must be if they are not to require opengroup / closegroup strings
  around them.

  #prec[n] where n is an integer is a shorthand for #prec[n(n)]

  #prec[H] is a shorthand for #prec[1000(0)].

  In an operator description, character sequences which are not matched 
below
  are written directly out to the output mathematics.

  #expri reference the recursive expansion (according to the rules
  in the MAL file) of the ith operand, where i is a positive integer. The
  highest i value present also acts as the number of operands which must be
  present in the MathML to avoid an error.

  #exprs[text] expands to the concatenation of each consecutive operand 
after
  expansion according to the rules. The string text intervenes between 
operands,
  but is not added before the first operand or after the last.

  #logbase expands to the expansion of the logbase element contents. This is
  only valid for log. If no logbase element is found, the string 10 will be
  inserted.

  #degree expands to the expansion of the degree element contents. It is 
only
  valid for root. If no degree element is found, the string 2 will be 
inserted.

  #bvarIndex expands to the text of the bvarIndex annotation (as 
retrieved by
  the AnnotationSet supplied to MaLaES) on the source of the bound variable
  referenced.

  #uniquen (where n is an integer) expands to a globally unique integer. 
If uniquei
  (for the same i) is used more than once in the same line, it refers to the
  same number. However, a different number is generated each time a rule is
  processed.

  #lookupDiffVariable (only valid on diff) finds the ci associated with the
  diff (differentiation of something other than a variable is not 
supported by
  this form, and will result in an error), and then finds the source 
variable
  associated with that ci. It then asks the supplied AnnotationSet for the
  degreeiname, where i is the degree of the diff.

  #supplement causes all subsequent output to be put into the supplementary
  stream, instead of the main output stream.

Name: unary_minus
Value: unary_minus works just like the MathML operator elements 
described above.
  However, the MathML operator minus is only processed according to the 
minus
  rule if it has two children. If it has one child, it is processed 
according
  to the unary_minus rule. If it has any other number of children, an 
error is
  raised.

I also have created a complete example, describing how to generate C 
expressions:

opengroup: (
closegroup: )
abs: #prec[H]fabs(#expr1)
and: #prec[20]#exprs[&&]
arccos: #prec[H]acos(#expr1)
arccosh: #prec[H]acosh(#expr1)
arccot: #prec[1000(900)]atan(1.0/#expr1)
arccoth: #prec[1000(900)]atanh(1.0/#expr1)
arccsc: #prec[1000(900)]asin(1/#expr1)
arccsch: #prec[1000(900)]asinh(1/#expr1)
arcsec: #prec[1000(900)]acos(1/#expr1)
arcsech: #prec[1000(900)]acosh(1/#expr1)
arcsin: #prec[H]asin(#expr1)
arcsinh: #prec[H]asinh(#expr1)
arctan: #prec[H]atan(#expr1)
arctanh: #prec[H]atanh(#expr1)
ceiling: #prec[H]ceil(#expr1)
cos: #prec[H]cos(#expr1)
cosh: #prec[H]cosh(#expr1)
cot: #prec[900(0)]1.0/tan(#expr1)
coth: #prec[900(0)]1.0/tanh(#expr1)
csc: #prec[900(0)]1.0/sin(#expr1)
csch: #prec[900(0)]1.0/sinh(#expr1)
diff: #lookupDiffVariable
divide: #prec[900]#expr1/#expr2
eq: #prec[30]#exprs[==]
exp: #prec[H]exp(#expr1)
factorial: #prec[H]factorial(#expr1)
factorof: #prec[30(900)]#expr1 % #expr2 == 0
floor: #prec[H]floor(#expr1)
gcd: #prec[H]gcd_multi(#count, #exprs[, ])
geq: #prec[30]#exprs[>=]
gt: #prec[30]#exprs[>]
implies: #prec[10(950)] !#expr1 || #expr2
int: #prec[H]defint(func#unique1, BOUND, CONSTANTS, RATES, VARIABLES, 
#bvarIndex)#supplement double func#unique1(double* BOUND, double* 
CONSTANTS, double* RATES, double* VARIABLES) { return #expr1; }
lcm: #prec[H]lcm_multi(#count, #exprs[, ])
leq: #prec[30]#exprs[<=]
ln: #prec[H]log(#expr1)
log: #prec[H]arbitrary_log(#expr1, #logbase)
lt: #prec[30]#exprs[<]
max: #prec[H]multi_max(#count, #exprs[, ])
min: #prec[H]multi_min(#count, #exprs[, ])
minus: #prec[500]#expr1 - #expr2
neq: #prec[30]#expr1 != #expr2
not: #prec[950]!#expr1
or: #prec[10]#exprs[||]
plus: #prec[500]#exprs[+]
power: #prec[H]pow(#expr1, #expr2)
quotient: #prec[900(0)] (int)(#expr1) / (int)(#expr2)
rem: #prec[900(0)] (int)(#expr1) % (int)(#expr2)
root: #prec[1000(900)] pow(#expr1, 1.0 / #degree)
sec: #prec[900(0)]1.0 / cos(#expr1)
sech: #prec[900(0)]1.0 / cosh(#expr1)
sin: #prec[H] sin(#expr1)
sinh: #prec[H] sinh(#expr1)
tan: #prec[H] tan(#expr1)
tanh: #prec[H] tanh(#expr1)
times: #prec[900] #exprs[*]
unary_minus: #prec[950]-#expr
xor: #prec[25(30)] (#expr1 != 0) ^ (#expr2 != 0)

Best regards,
Andrew

_______________________________________________
cellml-discussion mailing list
[email protected]
http://www.cellml.org/mailman/listinfo/cellml-discussion

[cellml-discussion] Describing rules for translating expressions into arbitrary languages

Reply via email to