[Felix-language] Regexp tokens: User defined literals

john skaller Tue, 06 Mar 2012 13:10:01 -0800

In the latest commit, I have added support for parsing regexps,
which might be used for user defined literals (or other stuff!!).


Just to explain: "in the old days" you could write grammar:

  tstatement :=  "mystatement" sname ";" =># "()";

which would recognise

  mystatement fred;

Here the "mystatement" in the grammar production is telling
the Dypgen lexer to recognise a string. 

Now you can do this:

////////////////////////
syntax fred {
 tstatement := max identifier sname ";" =># "()";
 regdef joe = "[..]";
 regdef max = "<..>";
 literal identifier =># "()";
 literal max =># "()";
 literal joe =># "`(ast_literal ,_sr (ast_string ,_1))";
 satom := joe =># "_1";

}
open syntax fred;

println$ "well .. " + [..];
<..> sabe joe;
/////////////////////

Here, we can write regular definitions with the "usual" syntax
which in the examples are just plain strings.

Then, we can map a regular expression to a non-terminal symbol
of the same name, and give an action when it is recognised.

Finally, we can use the non-terminal in a grammar production.

The name used in a literal statement must currently either be:

(a) a predefined name, defined in src/compiler/flx_parse/flx_dypgen.dyp 
in the %lexer section using a let clause

(b) a regdef name defined in the same dssl (i.e. the { } of the
same syntax extension).

Part (b) needs to be fixed to allow lookup in other dssls, probably
global lookup (qualified lookup makes more sense .. but it isn't
used for non-terminals at the moment).

A regdef should support the usual sequence, alternation | operator,
option ? operator, repetitions by * and +m, grouping with (),
strings in quotes, and character sets with 

        charset "ABCDEFG"

A plain name binds to a regdef as explained above.


The use of regular expressions here differs from a grammar
non-terminal because grammar productions allow whitespace
between symbols, regdefs do not. Of course grammar productions
are a lot more powerful than mere regular expressions.

Note also: the "rest of Felix" provides absolutely no support for
user defined literals at this stage. You can map them to strings
or some other supported literal, or, indeed, to any ast node,
but you can't do anything else like add post parsing support.
Of course, the Scheme user action codes are pretty powerful.
Post parsing support might be added by using Scheme
elsewhere in the compiler. (In fact there's an argument
to replace the whole compiler with Scheme augmented
with high performance functions written in Ocaml).

So why put this feature in?

The answer is: because I want to eliminate as much
dependence on the pre-built parser code as possible.
I want the *user* to define the layout of floating point
literals, add complex number literals if they like, etc.

Lifting the literal formats into user space makes documentation
easier: you can looking the library instead of the Ocaml sources.

It also make it possible to handle other languages more natively,
for example C identifiers are different to Felix ones. XML is another
kettle of fish. Etc.

--
john skaller
skal...@users.sourceforge.net
http://felix-lang.org




------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Felix-language mailing list
Felix-language@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/felix-language

[Felix-language] Regexp tokens: User defined literals

Reply via email to