Well if the gang-of-four can write rubbish so can we ... :) Various patterns occur regularly when doing grammars.
This note just reports some patterns I've found in Felix dynamically loaded grammars, and how I'm trying to simplify user specification of grammars and actions with sugar on top of Dypgen. This is for information only, not a suggestion to add features to Dypgen -- in fact I don't really care about Dypgen grammar syntax much, since most of the Felix grammars are loaded dynamically, and I'm doing my best to reduce the initial Dypgen grammar to a minimal kernel. At present, Felix loads the *Felix* grammar almost entirely dynamically: only the base attributed terminals such as identifiers, numbers, and strings are mapped to non-terminals because my modelling scheme is unable to extract attributes from terminals (because they have the wrong type). The dynamic loading is what I would call 'First Order' in the same sense as polymorphic functions: you can use the bootstrap grammar to add arbitrary syntax to Felix. However there is no 'Second Order' support, which would allow you to specify new syntax for specifying syntax. So this note is looking at the hand coded sugars for specifying grammars, with some view to understanding enough to think about how to make a second order parser. Note that Felix is already *technically* capable of this, in the sense that user actions are arbitrary Scheme programs -- all that is required is to map some of the extension operations currently hand coded into Scheme: the real question is what syntax to use to specify higher order grammars. ************* Felix aims to make grammar specification and verification easy, and this must include syntactic sugars to: * make it easy to write grammars * make it easy to verify an encoding matches a specification Many grammar specifications use variations on BNF. Felix currently provides the following basic form: nt := rhs =># action ; where rhs is a sequence of symbols and action is a Scheme program using the magic variable names _sr -- the source reference _1 _2 etc .. the attribute of the n'th symbol which must be a nonterminal As sugar for nt := rhs1 =># action1; nt := rhs2 =># action2; .. you can write: nt := | rhs1 =># action1 | rhs2 =># action2 .. ; In addition, if the rhs consists of a *single* nonterminal, you can leave off the action, and it defaults to "_1", which just propagates the attribute. This particular feature is likely to be augmented with a new technology: Dypgen allows productions and nonterminals to be parametrised by priorities. I want people to be able to write something like: expr:= term | ... term:= factor | ... factor:= atom | ... ( expr ) .. and what we desugar to is %relation atom < factor < term < expr expr' := | ... priority expr; expr' := | ... priority term; factor := | .. ( expr'(<=expr) ) ... priority factor This form has two advantages: * it is FASTER to parse, since there are no spurious 'chain' reductions that just pass values up the precedence chain. * it is possible to insert new levels into the chain dynamically.. this cannot be done with hard coded nonterminals without deleting and replacing chaining rules. With priorities, Dypgen allows a new priority to be inserted between others dynamically. The only issue is what happens to chain links in existing productions which would skip over the new level. This depends how they're coded. For example expr(term) := expr(<=term) + expr(<term) will capture a new level below term but above factor, but the encoding: expr(term) := expr(<=term) + expr(<=factor) will not, in the RHS component. However it seems the client has full control of this in the Dypgen syntax .. but NOT in my proposed Felix 'simplified' syntax, which aims to make the use of priorities 'transparent'. Iterators ********* Within a production, Felix now allows these sugars: nt * nt + nt ? These are the usual possibly empty list, non-empty list, and option types. The user actions make Scheme programs which create a list in the first two cases, or `(some ,_1) `none for the option case. The lists are made by LEFT recursion in reverse order, then reversed, because this is most efficient in an LR parser. At present these sugars do NOT propagate local_data or keep the grammar. A new feature is: ( rhs ) This is a group, it is replaced by a fresh nonterminal fresh_nt, and the rule fresh_nt := rhs =># action is added. This allows ( rhs )* for example. The action returns a list of attributes of the nonterminals in rhs (so terminals are skipped). This feature should probably be extended to support arbitrary RHS including actions and alternatives. PROBLEM: I guessed at the user actions, but they don't obey basic laws. For example nt + = nt nt* is false. nt+ is a non-empty list. nt nt* is a list with two elements, the first is nt, and the second is the list made by nt*. In fact the isomorphism can be expressed: nt + =># "_1" = nt nt* =># "(cons ,_1 ,_2)" that is, the user action fixes it. Still it is very unpleasant because some grammar specifications I am looking at say things like: (name (, name))? which is intended to mean 'a possibly empty list of names, separated by commas if there is more than one'. The term above indeed parses that .. but it doesn't return a list. The *ideal* thing here is to eliminate as many user action specifications as possible. This is very important in Felix not just because it is simpler and therefore less prone to error, but because the user actions have to be written in Scheme, which is not so easy to write, and debugging them is a serious hassle, since OCS Scheme has limited debug support AND because the Scheme programs are executed under control of Dypgen and wrapper actions, not directly by the user. Poor mans Keywords ****************** Another sugar Felix provides is the symbol "symbol" written as a string. This matches a Felix identifier first, but the system user action then compares its string value with he given string "symbol" and raises Giveup, causing the GLR thread to die. These poor mans terminals are currently treated as terminals, that is, they don't return an attribute. -- John Skaller <skaller at users dot sf dot net> Felix, successor to C++: http://felix.sf.net ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Felix-language mailing list Felix-language@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/felix-language