The new dypgen based syntax extension mechnism has its first new feature .. :) It seems to work on once case:
/// fragment of lib/nugram.flx ////// satom := match sexpr with smatching+ endmatch =># "`(ast_match (,_2 ,_4))"; /* smatchings := smatching =># "`(,_1)"; smatchings := smatching smatchings =># "(cons _1 _2)"; */ smatching := | spattern => sexpr =># "`(,_2 ,_4)"; smatching := | => sexpr =># "`(pat_none ,_4)"; //////////////////////////////////////// You can now write nt* nt+ nt? for a possibly empty sequence of nt, a non-empty sequence of nt, or an optional nt in the grammar production as shown. In the example smatching+ is a replacement for the nonterminal smatchings which is commented out. This works by defining extra rules: smatching__nelist := smatching =># "`(,_1)"; smatching__nelist := smatching smatching__nelist =># "(cons _1 _2)"; The extensions __list, __nelist, __opt are used for possibly empty list, non-empty list, and optional. Optional construction for x returns none (some x) for the missing and non-missing cases respectively. Both lists return an ordinary scheme list. An empty list is just '() in Scheme. A small change has also been made to the way rules are stored, so that the set of stored rules contains no duplicates. This was done in the hope that if you write, for example fred+ in two places, the duplicate rules generated will be eliminated. You should note that in the production: satom := match sexpr with smatching+ endmatch =># "`(ast_match (,_2 ,_4))"; the + is not counted for purposes of numbering the non-terminal attributes. This is because smatching+ is replaced by smatching__nelist which is only a single non-terminal. At present +,* and ? must follow a single non-terminal. It is not possible to apply them to a sequence. For example you cannot write: x := (a b)* c =># .... because at present this would apply * to the ) symbol. That isn't allowed, since ) isn't a non-terminal, and ( .. ) currently means to parse a ( then to parse .. then to parse a ), i.e. the ( and ) are just ordinary tokens, not groupings. This may change, see below. Because you can't write * + and ? to mean themselves, the substitute non-termianls star // for * plus // for + quest // for ? have been introduced into the initial grammar, and must be used for those tokens. FUTURE EXTENSIONS -- MORE TEMPLATES ------------------------------------ I hope the * + ? feature will be useful immediately, but it is hard coded. We can view these features as 'templates' or 'syntactic sugar', which are replaced by new sequence of tokens PLUS some new rules are defined. A generalisation of this would allow more or less arbitrary such sugar to be defined by the user. For example some kind of way to write: nonterminal + =># .. code to generate extension here .. It's almost painfully obvious how to do this... USE DYPGEN! Specifically, we have: statement: ... | SYNTAX NAME LBRACE dyprods RBRACE dyprod: | NAME COLONEQUAL rhs PARSE_ACTION statement | NAME COLONEQUAL rhs PARSE_ACTION STRING SEMI .... where rhs is any sequence of tokens other than PARSE_ACTION (=>#). So of course, dyprod, rhs, etc .. are *already* extensible with new productions, it remains to make it possible to specify user actions which somehow implement the templates. FUTURE EXTENSIONS -- SCHEME ENVIRONMENT ---------------------------------------- At present, Felix has 3 separate extension mechanism. First, the old preprocessor based extension mechanism uses a fixed grammar plus a lookup table which enables user brackets and infix operators to map to a function application, and allows statements to be parsed by a recursive descent interpretive parser, with user actions represented by existing syntax with special identifiers _1 _2 etc representing holes. The parser actually parses the user actions to generate an 'almost' ordinary AST term, and records the AST terms the symbols in the production match. The macro processor then plugs these argument terms into the _1 _2 parameters to produce a desugared term which is then processed normally (as if the parser had generated it directly). This mechanism requires a lot of support and is restricted in the kinds of terms it can handle. It also doesn't scope, since the new grammar is extended by the pre-processor. The second mechanism implemented used Dypgen to do a similar job in a similar way. With dypgen, the extensions are specified with ordinary Felix statements, not the preprocessor, and scope properly. The extensions are packaged up into a Domain Specific Sub-Language definition, for example: ////////////////////////////////// #keyword publish syntax syn_document { statement := publish string_literal statement =># _3 ; } ////////////////////////////////// introduces a new statement in which any statement can be prefixed with a descriptive comment. The dssl 'syn_document' can then be enabled by saying: open syntax syn_document; This grabs the stored rules and sends them to Dypgen, which builds a new automaton which includes those rules. The automaton is only in scope within the production which contains the open directive, and doesn't affect any other parallel GLR parse attempts. The effect is to make the extension available from the point the directive is issued up to the end of the current module, function body, or other places where statements are parsed. This mechanism is much better than the pre-processor hackery, because it is more modular (extensions are grouped into DSSLs), properly scoped, and it integrates seamlessly with the existing grammar and parser technology, instead of using an LR/recursive descent hybrid. In particular, there's no longer any need to trigger parsing of new statements with specific keywords. Still, this system suffers from the restriction that it can only handle non-terminals for which it has been specifically prepared: whilst expressions and statements are supported, there was no support for extending patterns, or any other sub-part of a term, such as the 'adjectives' that can be applied to functions (inline, noinline, virtual, etc). The third mechanism eliminates this problem by defining all user actions as Scheme programs which are executed to return an s-expression, that is, a Scheme term. These scheme term are plugged together using Scheme itself, and the final scheme AST is converted to an intermediate S-expression type, and then to the original kind of Felix AST terms. Since Scheme is dynamically typed, there are no restrictions on what kinds of Felix AST terms can be built, or how, provided the scheme -> sex -> felix mapping can recognise the terms produced. That reduces the workload for handling arbitrary Felix terms to a single translation function. Now, having explained all this .. at present each Scheme term is represented as a string. When the corresponding nonterminal is reduced, the string is evaluated as Scheme in an environment enriched by variables named _1 _2 .. etc, and defined as the scheme terms which are the attributes of the non-terminals matched. However, any utility functions, etc, have to be defined in EVERY user action that needs them, because a new environment is constructed for every action. This ensures the evaluations of each action don't interfere with each other. The following production: /////////////////////////////////// selse_part:= selifs else sexpr =># """ (letrec ((fold_left (lambda (f acc lst) (if (null? lst) acc (fold_left f (f acc (car lst)) (cdr lst)))))) (let ((f (lambda (result condthn) (let ((cond (car condthn)) (thn (cadr condthn))) `(ast_cond (,cond ,thn ,result)))))) (fold_left f _3 _1))) ////////////////////////////////// defines 'fold_left' in scheme, then applies it to construct a new term. It would be useful if the definition of fold_left could be given just once, and made available to all the user actions. So this very long winded explanation is presented to suggest that we need to provide a way to extend the user action code too, not just the grammar. The obvious way to start is the conventional way for a programming language: an environment in which subroutines can be defined which can be subsequently shared by user actions. Scheme itself has a reputation for its ability to extend itself and includes a macro facility, so quite a bit of interesting stuff can probably be done using Scheme itself, if the right basic extension interfaces are provided. If I can digress a bit, applying the Felix extension idea to C++ we have a need to keep track of which names are typedef names, to make it possible to parse it, and a mutable store would be useful for that. No doubt more useful features will become evident as the experiment progresses. -- John Skaller <skaller at users dot sf dot net> Felix, successor to C++: http://felix.sf.net ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Felix-language mailing list Felix-language@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/felix-language