Re: [antlr-dev] v4 code gen

Kay Röpke Fri, 23 Apr 2010 04:12:22 -0700

hi!

On Apr 23, 2010, at 2:59 AM, Terence Parr wrote:

hi, started thinking about it...

http://www.antlr.org/wiki/display/~admin/v4+code+generation

while i totally agree with the goal to reduce the number of templates nested conditionals get messy, too. recently i wrote a relatively simple codegenerator for a custom google protobuf implementation, here's one of the simpler templates:


buildProtobuf(field, names) ::= <<

<if(field.transient)><! this field is computed from some other field(s), skip it in the merge code !> // <field;format="variableName"> is transient, no sense in serializing it.

<else>
<if(field.repeated)>

// TODO serialize repeated fields properly, depends on the field type: native types are fine (even if boxed), message types are not fine at all.

<else>
<if(field.messageType)>
<! message type, we should not follow asset links !>
if (has<field;format="methodName">) {

<names.outerClassName>.<field; format="shortTypeName">.Builder <field;format="variableName">Builder = <names.outerClassName>.<field; format="shortTypeName">.newBuilder(); < field ;format ="variableName">Builder.setId(get<field;format="methodName">().getId()); b.set<field;format="methodName">(<field;format="variableName">Builder);

}
<else>
<! plain native type just call protobuf builder !>
if (has<field;format="methodName">) {
    b.set<field;format="methodName">(get<field;format="methodName">());
}
<endif>
<endif>
<endif>
>>

as you can see, there are only three conditions, but even those make it icky to follow already. other parts of the template group are even worse, and antlr seems to have even more branches in its codegen.

now, having the multiple templates is not ideal either.

part of the problem stems from not being able to tell which template applies in which circumstance. in the past i've tried to model templates after classes or methods, having one template for each variant of the template output, much like antlr does it today (although the division in the code isn't as clear in antlr3 today). what i've noticed in that approach is that there are often large chunks of text that are common between multiple templates. the next step was to factor those common parts out, but unfortunately that usually made it almost incomprehensible, too, not to mention that it very closely ties the template structure to the code structure. but that's only going to be a problem if one can anticipate major refactorings in the code. we can probably ignore that because antlrs code generator is likely

to be pretty stable over time.

i've also noticed that proliferation of setAttribute() calls makes it much much harder to follow what's going on, totally agree with passing in sensible objects. most of the time i'm passing in the model objects directly, once in a while i'm wrapping several model objects in "view controller" objects, just to be able to access related data in the templates without requiring me to change my model.

perhaps a sensible approach would be to have a hierarchy of "token ref representation" classes, which get instantiated depending on the context of that token reference (that would be some decision in a tree walker, i guess). otoh, that would be a 1-to-1 relationship with the number of templates again :( but effectively most of the tokenref templates already are factored out a lot, referring to one another and other common elements like listLabel.

i think that just by introducing some representation classes the templates would become much simpler, for example take the various matching templates like lexerStringRef, wildcard et al: they all have an <if(label)>...<endif> clause. by passing in a label representation object, instead of the label string, that could collapse down to <label>, pushing some of the logic back into the code generator. it's entirely feasible to have a "null label" that expands to the empty string, if there is no label. i guess that once you start looking for ifs many of them are actually of this kind. another example: wildcardChar vs wildcardCharListLabel. the latter is a superset of the former, but now you have two templates instead of one, instead of always assuming there is a listlabel, even if that might be the null listlabel.


probably i've been chasing templace invocation chains for too long ;)

btw, the example i pasted above used template invocations for the various cases before, rendering it completely unreadable over time. that kind of invalidates my point, because i've gone back to the ifs, but i guess that just illustrates that it is a thin line.


cheers,
-k
--
Kay Röpke

smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
antlr-dev mailing list
[email protected]
http://www.antlr.org/mailman/listinfo/antlr-dev

Re: [antlr-dev] v4 code gen

Reply via email to