In the latest commit, I have added support for parsing regexps, which might be used for user defined literals (or other stuff!!).
Just to explain: "in the old days" you could write grammar: tstatement := "mystatement" sname ";" =># "()"; which would recognise mystatement fred; Here the "mystatement" in the grammar production is telling the Dypgen lexer to recognise a string. Now you can do this: //////////////////////// syntax fred { tstatement := max identifier sname ";" =># "()"; regdef joe = "[..]"; regdef max = "<..>"; literal identifier =># "()"; literal max =># "()"; literal joe =># "`(ast_literal ,_sr (ast_string ,_1))"; satom := joe =># "_1"; } open syntax fred; println$ "well .. " + [..]; <..> sabe joe; ///////////////////// Here, we can write regular definitions with the "usual" syntax which in the examples are just plain strings. Then, we can map a regular expression to a non-terminal symbol of the same name, and give an action when it is recognised. Finally, we can use the non-terminal in a grammar production. The name used in a literal statement must currently either be: (a) a predefined name, defined in src/compiler/flx_parse/flx_dypgen.dyp in the %lexer section using a let clause (b) a regdef name defined in the same dssl (i.e. the { } of the same syntax extension). Part (b) needs to be fixed to allow lookup in other dssls, probably global lookup (qualified lookup makes more sense .. but it isn't used for non-terminals at the moment). A regdef should support the usual sequence, alternation | operator, option ? operator, repetitions by * and +m, grouping with (), strings in quotes, and character sets with charset "ABCDEFG" A plain name binds to a regdef as explained above. The use of regular expressions here differs from a grammar non-terminal because grammar productions allow whitespace between symbols, regdefs do not. Of course grammar productions are a lot more powerful than mere regular expressions. Note also: the "rest of Felix" provides absolutely no support for user defined literals at this stage. You can map them to strings or some other supported literal, or, indeed, to any ast node, but you can't do anything else like add post parsing support. Of course, the Scheme user action codes are pretty powerful. Post parsing support might be added by using Scheme elsewhere in the compiler. (In fact there's an argument to replace the whole compiler with Scheme augmented with high performance functions written in Ocaml). So why put this feature in? The answer is: because I want to eliminate as much dependence on the pre-built parser code as possible. I want the *user* to define the layout of floating point literals, add complex number literals if they like, etc. Lifting the literal formats into user space makes documentation easier: you can looking the library instead of the Ocaml sources. It also make it possible to handle other languages more natively, for example C identifiers are different to Felix ones. XML is another kettle of fish. Etc. -- john skaller skal...@users.sourceforge.net http://felix-lang.org ------------------------------------------------------------------------------ Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d _______________________________________________ Felix-language mailing list Felix-language@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/felix-language