[Haskell-cafe] Re: Fun with uu-parsinglib

S. Doaitse Swierstra Thu, 29 Jul 2010 01:16:03 -0700

On 29 jul 2010, at 05:04, David Place wrote:

> Hi, Doaitse.
> 
> I am making good progress transcribing my parser to use your library.   I 
> think some ways that I have grown accustomed to working with Parsec will not 
> work, though.   Now, I am getting the run time error:
> 
>>  Result: *** Exception: cannot compute minimal length of right hand side of 
>> monadic parser
> 
> Is there an explanation of this error in the documentation?  I am trying to 
> combine with alternation some parsers which return the semantic data 
> structures of my program.  Perhaps there are restrictions on the kind of 
> parsers that can be combined with alternation.
> 
> Cheers,
> David
> ____________________
> David Place   
> Owner, Panpipes Ho! LLC
> http://panpipesho.com
> d...@vidplace.com
> 
> 
>



Dear David,

I am cc-ing this answer to Haskell-café since the answer may be of a wider 
interest.

Let me first explain a a sequence of design choices I made:

1) my parser combinators perform error correction by trying to insert missing 
tokens into and to delete superfluous tokens from the input
2) of course there are many possibilities to do so 
3) hence I implemented a search process which currently prunes the tree of 
possible corrective choices up to depth three
4) this process may return several solutions with the same overall cost and I 
have to pick one of those
5) in case of a draw I give preference to those alternatives which delete 
something, so progress is guaranteed
6) at the end of the input this does not work, since there is nothing to delete
7) if I pick an arbitrary alternative this may lead to a choice where we get an 
infinite sequence of insertions; suppose e.g. we want to insert an expression, 
and the system picks the if_alternative. If it does so it inserts an if-token 
and then the process starts all over again by trying to insert a conditional 
expression, which in general will be an expression, which will lead to the 
insertion of another if-token, etc.
8) in order to prevent this behaviour internally an abstract interpretation is 
made which computes the minimal number of tokens a parser will recognise (and 
thus can insert). The recursive alternatives will get an infinite length,  and 
will thus never be chosen unless you grammar is well-formed and all choices 
only can lead to an infinite sequence of insertions
9) if a choice has to be made for the correction process always a finite 
alternative is chosen, so the correction process is guaranteed to complete

Now the monads come in; they raise the question how to compute the minimal 
length of a parser WHICH DEPENDS ON A PREVIOUSLY RECOGNISED part of the input. 
In general this is impossible. So I made the decision to generate an error 
message in such cases, since I did not expect many people to run into this.

But now you, and probably many people used to writing parsers in the monadic 
(i.e. Parsec) style turn up, and you continue to use the monadic style bacuse 
you have become accustomed to a different way of writing parsers. Let me give 
an example, based on the following grammar for well-formed nested parentheses: 
S -> (S)S ; epsilon. We want to return e.g. the maximal nesting depth. This is 
one of the functions in the Examples.hs file in the uu-parsinglib. The 
prototypical way of writing this in applicative style, without using monads is:

wfp :: Parser Int
wfp =  max . (+1) <$> pParens  wfp <*> wfp 
      `opt` 
       0

Now let us take a look at the monadic formulation:

wfp  = do lt <- pParens wfp
          rt <-         wfp   -- < second call to wfp
          return ((lt + 1) `max` rt)
       <|> return 0

Now we see that the second call to wfp is within the scope of the binding of 
lt, and hence may depend on it. In this case it does not, but unfortunately the 
abstract interpretation cannot see this, and thus cannot do very much.

This shows in my view the bad thing about monadic parser combinators: they 
prohibit the self-analysis of parsers, they inhibit optimisations, and they 
inhibit feedback based on the analysis. More on this below.

Since I assume that the monadic formulation is only used as a last resort, i.e. 
when a parser REALLY depends on previous input, and that this will in general 
not be part of an alternative construct, I have decided to generate the above 
error message.

Another solution I could have chosen is to make the choice in the correction 
process based on the order in which the alternatives occur in the program, but 
this would create the need to explain to users that the order of their 
alternatives matters, and it is one of the nice things that thus far it does 
not. In the old uulib the order even gets completely lost because we internally 
build tables which assist in speeding up the parsing process.

There are several solutions now:

a) I just remove the error message and make an educated guess, leading to 
situations where users might get stuck in an infinite correction process (only 
at the end of the file; normally in case of a draw preference is given to 
delete something in order to guarantee progress), and then start to ask 
questions about what is going on. Notice that they will not get feedback except 
probably a stack overflow.
b) We leave it like this assuming that when people use a monadic style this 
will not be in the branch of an alternative construct
c) We add an extra combinator which removes the error message, and we require 
the user to indicate that a specific alternative should never be chosen in the 
insertion process
d) Internally I check to see  whether more than  specific number of insertions 
occur in a row, and then stop. This will however make parsing a bit more 
expensive, which I always try to avoid.

Let me now come back to explain g further why I think the monadic style should 
be avoided where-ever possible because they prohibit abstract interpretation.

If in the uu-parsinlib you write a parser like: pList (pList (pSym 'a')) you 
will get an an error message telling that this parser does not make sense, 
since the outer pList gets as an argument a parser which can accept the empty 
string. The good news is that this is something we can statically recognise, 
EVEN IN THE CASE we use monadic parsers. When we have p >>= q then either p is 
not empty, and if it is we know the value it returns and we can use this to see 
whether the q part may return empty. Notice that this is an analysis we have to 
do only once for each parser, and hence costs can be almost neglected.

My conclusion is: avoid monadic parsers as much as possible, and only use them 
where you really want to continue parsing with a parser which really depends on 
the input, like in:

n_as = do n <- pInt
          pEaxct (pSym 'a')

pExact 0 p = pReturn []
pExact n p = (:) <$> p <*> pExact (n-1) p

I have explained this observation before in my lecture notes for the Advanced 
Functional Programming summer school in 1996, section 5.2:


@inproceedings{SwieDupo96,
        Author = {Swierstra, S. D. and Duponcheel, L.},
        Booktitle = {Advanced Functional Programming},
        Editor = {Launchbury, John and Meijer, Erik and Sheard, Tim},
        Pages = {184-207},
        Publisher = {Springer-Verlag},
        Series = {LNCS-Tutorial},
        Title = {Deterministic, Error-Correcting Combinator Parsers},
        Urlpdf = 
{http://www.cs.uu.nl/people/doaitse/Papers/1996/DetErrCorrComPars.pdf},
        Volume = {1129},
        Year = {1996}}

Good luck with your further attempts, and let me know which of the a)-d) you 
prefer, and do not hesitate to inform me about the next problem you may run 
into ;-}}

Doaitse

PS: if you want to do any further semantic processing then you might consider 
using the uuagc compiler to build te functions which typically arise in from of 
the <$>'s: http://www.cs.uu.nl/wiki/bin/view/HUT/AttributeGrammarSystem

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: Fun with uu-parsinglib

Reply via email to