Well if the gang-of-four can write rubbish so can we ... :)

Various patterns occur regularly when doing grammars.

This note just reports some patterns I've found in Felix
dynamically loaded grammars, and how I'm trying to simplify 
user specification of grammars and actions with sugar on 
top of Dypgen. This is for information only, not a suggestion
to add features to Dypgen -- in fact I don't really care
about Dypgen grammar syntax much, since most of the Felix
grammars are loaded dynamically, and I'm doing my best
to reduce the initial Dypgen grammar to a minimal kernel.

At present, Felix loads the *Felix* grammar almost entirely
dynamically: only the base attributed terminals such as
identifiers, numbers, and strings are mapped to non-terminals
because my modelling scheme is unable to extract attributes
from terminals (because they have the wrong type).

The dynamic loading is what I would call 'First Order'
in the same sense as polymorphic functions: you can use
the bootstrap grammar to add arbitrary syntax to Felix.

However there is no 'Second Order' support, which would allow
you to specify new syntax for specifying syntax.

So this note is looking at the hand coded sugars for specifying
grammars, with some view to understanding enough to think about
how to make a second order parser. Note that Felix is already
*technically* capable of this, in the sense that user actions
are arbitrary Scheme programs -- all that is required is to
map some of the extension operations currently hand coded
into Scheme: the real question is what syntax to use to
specify higher order grammars.

*************

Felix aims to make grammar specification and verification
easy, and this must include syntactic sugars to:

* make it easy to write grammars
* make it easy to verify an encoding matches a specification

Many grammar specifications use variations on BNF.
Felix currently provides the following basic form:

        nt := rhs =># action ;

where rhs is a sequence of symbols and action is a Scheme
program using the magic variable names

        _sr -- the source reference
        _1 _2 etc .. the attribute of the n'th symbol
                which must be a nonterminal

As sugar for

        nt := rhs1 =># action1;
        nt := rhs2 =># action2;
        ..

you can write:

        nt := 
                | rhs1 =># action1
                | rhs2 =># action2
                ..
        ;

In addition, if the rhs consists of a *single* nonterminal,
you can leave off the action, and it defaults to "_1", which
just propagates the attribute.

This particular feature is likely to be augmented with 
a new technology: Dypgen allows productions and nonterminals
to be parametrised by priorities. I want people to be able
to write something like:

        expr:= term | ...
        term:= factor | ...
        factor:= atom | ... ( expr ) ..

and what we desugar to is

        %relation atom < factor < term < expr
        expr' := | ... priority expr;
        expr' := | ... priority term;
        factor := | .. ( expr'(<=expr) ) ... priority factor

This form has two advantages:

* it is FASTER to parse, since there are no spurious 'chain'
reductions that just pass values up the precedence chain.

* it is possible to insert new levels into the chain
dynamically.. this cannot be done with hard coded nonterminals
without deleting and replacing chaining rules. With priorities,
Dypgen allows a new priority to be inserted between others
dynamically.

The only issue is what happens to chain links in existing
productions which would skip over the new level. This depends
how they're coded. For example

        expr(term) := expr(<=term) + expr(<term)

will capture a new level below term but above factor, but the encoding:

        expr(term) := expr(<=term) + expr(<=factor)

will not, in the RHS component. However it seems the client
has full control of this in the Dypgen syntax .. but NOT in
my proposed Felix 'simplified' syntax, which aims to make
the use of priorities 'transparent'.

Iterators
*********

Within a production, Felix now allows these sugars:

        nt *
        nt +
        nt ?

These are the usual possibly empty list, non-empty list,
and option types. The user actions make Scheme programs
which create a list in the first two cases, or

        `(some ,_1)
        `none

for the option case. The lists are made by LEFT recursion
in reverse order, then reversed, because this is most
efficient in an LR parser. At present these sugars do NOT 
propagate local_data or keep the grammar.

A new feature is:

        ( rhs )

This is a group, it is replaced by a fresh nonterminal fresh_nt,
and the rule 

        fresh_nt := rhs =># action

is added. This allows ( rhs )* for example. The action
returns a list of attributes of the nonterminals in rhs
(so terminals are skipped).

This feature should probably be extended to support arbitrary
RHS including actions and alternatives.

PROBLEM: I guessed at the user actions, but they don't
obey basic laws. For example

        nt + = nt nt*

is false. nt+ is a non-empty list. nt nt* is a list with two elements,
the first is nt, and the second is the list made by nt*.
In fact the isomorphism can be expressed:

        nt + =># "_1" = nt nt* =># "(cons ,_1 ,_2)"

that is, the user action fixes it. Still it is very unpleasant
because some grammar specifications I am looking at say things like:

        (name (, name))?

which is intended to mean 'a possibly empty list of names,
separated by commas if there is more than one'. The term
above indeed parses that .. but it doesn't return a list.

The *ideal* thing here is to eliminate as many user action
specifications as possible. This is very important in Felix
not just because it is simpler and therefore less prone
to error, but because the user actions have to be written
in Scheme, which is not so easy to write, and debugging them
is a serious hassle, since OCS Scheme has limited debug support
AND because the Scheme programs are executed under control
of Dypgen and wrapper actions, not directly by the user.


Poor mans Keywords
******************

Another sugar Felix provides is the symbol

        "symbol"

written as a string. This matches a Felix identifier first,
but the system user action then compares its string value
with he given string "symbol" and raises Giveup, causing the
GLR thread to die. These poor mans terminals are currently 
treated as terminals, that is, they don't return an attribute.


-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Felix-language mailing list
Felix-language@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/felix-language

Reply via email to