Florian Hars writes:
 > [...] Show me any working programmer who reads the "Improving
 > laziness"-part in the Hutton/Meijer paper on monadic parser
 > combinators and says "Oh! What an elegant language! And these nifty
 > efficiency improving no-ops like (fst (head x), snd (head x)): tail
 > x !" (By the way, can anyone explain this section to me?)

Rather than a no-op, I prefer to think of it as a chiropractor: it
fixes up the spine.

In general, a parser may fail, which Hutton and Meijer represent by an
empty list of possible parses.  However, a parser which recognises
zero or more `a's - with the help of some other parser which
recognises exactly one `a' - is infallible.  It can always return
[([], s)] where s is the entire input string it was given, meaning "I
successfully found zero `a's."

The overly strict parse of zero or more `a's has this structure:

    do { x  <- p;
         xs <- (
                  do { x  <- p;
                       xs <- (
                                --etc.
                             );
                       return (x:xs) } +++ return []
               );
         return (x:xs) } +++ return []

where p is a parser which recognises exactly one `a', and p1 +++ p2
means "return the first possible parse from p1 if p1 succeeds,
otherwise the first possible parse from p2."  In this case, the x <- p
part may fail (because there isn't an `a' next in the input) but the
return [] part always succeeds.

This is overly strict because the spine of the expression, which
contains all the (+++) nodes, must be evaluated before the run time
system can (reasonably be expected to) determine the outermost
data constructor in the resulting list of possible parses.

The (fst (head x), snd (head x)) : tail x construction assures the run
time system that a parser for zero or more `a's is infallible.  It
works because the outermost (+++) is now of the form

    Foo (_:_) +++ Foo []

instead of

    Foo _ +++ Foo []

where Foo stands for some additional structure which doesn't affect
the strictness issue, and _ stands for an expression node which is not
necessarily a data constructor.

To express the same hint even more niftily, we'd need to be able to
express laws like "for all p, p +++ return [] is infallible."

Terminology: Someone more confident with the terms WHNF, closure,
suspension and thunk, could probably have used them in an explanation,
where I've resorted to waffling.  If I've used terms like spine
inappropriately, will someone please say so (and explain)?

Regards,
a working Informix-4GL programmer  :-)

Reply via email to