Askar Safin wrote: > Hi. Often the only thing I want to do in Bison is just generate > AST and nothing else. Unfortunately, in this case code becomes > very repetitive. For example, this is Bison input I used to > generate AST for small JavaScript subset: > https://zerobin.net/?b9af68c9aa7a31a9#JB4wZseYTq9aKfZOwwZrDqBNCqlEZRj/+DM9bKdgtKU= > . Please, support some feature, which eliminates the need to write > such repetitive code. Such code can be easily generated from > grammar alone
Indeed, I had discussed something similar with Akim Demaille privately some time ago; he also mentioned it on this list a few weeks ago (I guess I'm one of the "some people" he refers to, but unfortunately I was too busy then to get involved): : Having a means to ask Bison to generate actions given some form of : a generic pattern is a different topic. It makes a lot of sense, : and some people have discussed about this on various occasions : here. That would help to bind to some abstract factory for ASTs : for instance, without forcing a specific AST format. So let me restate and expand upon my suggestion here, which is, of course, just a rough sketch. This refers mainly to the C++ output as it makes heavy use of templates and overloading. Maybe something similar can be done in Java etc., but I'll leave that to the respective experts. In C, I think this will only work in a less general way which requires more effort by the end-user. Actually, this would not only mostly automatically generate actions to generate ASTs and the like, but also some normal actions in my current grammars that don't primarily build ASTs, but contain actions that build "mini sub-ASTs", i.e. structures that are later acted upon in other actions, e.g. parameter lists while parsing a function declaration or call. I guess such actions might be common in many grammars. AFAIK, Bison's current behaviour applied to any rule can be summed up like this (I hope I'm not missing important details): 1. If a user-specified action is given, output this action and nothing else; otherwise: 2. if $$ has no declared type, do nothing; otherwise: 3. if there are no RHS symbols, default-initialize $$ (when Bison knows how to, esp. when using variants); otherwise: 4. output the default action "$$ = $1;" (possibly with auto-move), warning if the declared types of $$ and types are different (which may still result in a successful compilation if the types are compatible, or a compiler-error down the line if not). As a side note, though I suggest to introduce this default action into C++, I'm not sure if it's actually useful to apply it (in any language) when there's more than one RHS argument (or at least, more than ne typed one, more about this below). But if that's required by POSIX, we'd have no choice, at least in C. My suggestion is basically to insert another step (optional, of course) to auto-generate actions just based on the types involved before 3., or even before 2. -- the latter could be useful to generate actions that consume parsed data (e.g. to compile a function or to store a declaration in an interpreter) rather than build structures such as ASTs, e.g.: 1.5: if an auto-action for the types of $$ and $n can be generated, do so; otherwise: ... So how can such an auto-action be specified? A rather general way may use a form similar to %printer, e.g.: %auto-action { build_foo ($$, $^); } <foo>; Then any rule where $$ is of type foo and which has no explicit user-action would use build_foo automatically. %auto-action { build_default ($$, $^); } <*>; This would apply to any type of $$ not mentioned explicitly. I'm using a new special token "$^" here (the name is only exemplary, in analogy with make) to insert the RHS arguments as a comma-separated list (again with auto-move if enabled). More precisely, only those RHS arguments with a declared type, since (a) Bison wouldn't know how to insert arguments without declared types anyway (esp. with variants), and (b) automatically omitting them allows further simplification in a lot of common (IMHO) cases, e.g.: expr: '(' expr ')'; would then auto-generate: build_auto ($$, $2); without having to mention $2 explicitly (assuming the '(' and ')' tokens have no type), or in a hypothetical language: decl: "function" id '(' parlist ')' attrs ';'; would auto-generate: build_auto ($$, $2, $4, $6); discarding all the syntactic fluff automatically. An even more general way would be to discriminate on the types of $$ and $n, like this: %auto-action { build_foo ($$, $1, $2); } <foo, bar, baz>; This would avoid "$^" because the number of RHS arguments is specified explicitly. However, (a) I fear this might be a lot of effort to implement (whereas the discrimination on $$ only can hopefully reuse the %printer infrastructure), (b) might actually require more effort by the user to implement more such directives, and (c) might mainly be of interest for C grammars. In C++ (and perhaps Java etc.), OTOH, I'd rather let the compiler do the selection; I might only ever use: %auto-action { build_default ($$, $^); } <*>; to defer it all to a single overloaded and templated function. So then how could this function be implemented? (Note, the following is just a collection of ideas, not tested and probably with a number of more or less hairy issues, but just to convey the idea. I'm assuming auto-move throughout.) First a very general version that constructs $$ from the arguments if possible. This already covers case 3. above (if T has a default constructor), as well as case 4. with the side note (if T has a copy or move constructor), but also actions that must just pack their arguments in some simple structure (e.g. std::pair, std::struct or a user-defined structure) for later use, or that build some object using any of its constructors. // probably need enable_if<is_constructible ...> here ... template<typename T, typename... R> void build_default (T& t, R&&... r) { t = T { std::forward<R> (r)... }; } Another overload could automatically build smart pointers (here, unique_ptr, but likewise for shared_ptr or similar) when applicable: template<typename T, typename... R> void build_default (std::unique_ptr<T>& t, R&&... r) { t = std::make_unique (std::forward<R> (r)...); } When building containers (e.g. parameter lists), the rules that build the start (i.e., a container with 0 or 1 elements, depending on the grammar) should already be covered by the general version above using the default constructor or single-element list initialization. For rules that extend containers, we could have: // maybe enable_if<is_container<T>> ... template<typename T, typename... R> void build_default (T& t, T&& c, R&&... r) { t = std::move (c); t.emplace_back (std::forward<R> (r)...); } Such basic and exemplary overlads could be shipped with Bison, either in a generated header (optionally), or a static header distributed with Bison (though I know Bison doesn't install any headers so far, so this would be something new), or at least in the documentation, so users would have the tricky parts taken care of already. Auto-generating most actions (esp. for AST builders) might then become as simple as adding one "<*>" action, and declaring all symbols with suitable types (which is good style anyway), so the overloads can apply automatically. User code could add more application-specific overloads where needed. E.g. in some of my grammars I have a number of rules like this, one for each precedence group of operators: expr: expr or_op expr { $$ = build_binary_expr ($1, $2, $3); }; expr: expr and_op expr { $$ = build_binary_expr ($1, $2, $3); }; Then I could instead write just once: void build_default (Exp& t, Exp&& a, Op&& b, Exp&& c) { t = build_binary_expr (std::move (a), std::move (b), std::move (c)); } In any case, user code keeps all flexibility. If the general overloads are not wanted for some rule, one can add a more specific overload for the types involved, and if that won't do, one can just write an explicit action like before which will disable all auto-generation for this rule. To cover case 2. above, we'd just need a special syntax for "no $$" ("void" won't do with the syntax above, since "void &" is not valid). E.g., just dropping $$ from the parameter list in this case (and not covering this case with "<*>"), so e.g. to auto-generate rules to print everything that's discarded due to lack of an action, something like: %auto-action { do_print ($^); } <>; : The thing is... We need more hands on Bison. I know, and unfortunately I'll still be busy for the next few weeks, but I'm optimistic I can find some time to work on Bison next year. This would be a feature of interest (and immediate use) to me, so I think I could work on this; esp. the "build_default" implementations (which is basically normal, though heavily templated, C++ programming), including testing them with my grammars. For the work required in Bison proper, I might need some help, though I hope it might be as simple as adding one new directive like "%printer". Cheers, Frank