Re: Parser.y rewrite with parser combinators

Vladislav Zavialov Mon, 08 Oct 2018 15:25:39 -0700

> complex parsers written using parsing combinators is that they tend to be 
> quite difficult to modify and have any kind of assurance that now you haven't 
> broken something else


That's true regardless of implementation technique, parsers are rather
delicate. A LALR-based parser generator does provide more information
when it detects shift/reduce and reduce/reduce conflicts, but I never
found this information useful. It was always quite the opposite of
being helpful - an indication that a LALR parser could not handle my
change and I had to look for workarounds.

> With a combinator based parser, you basically have to do program 
> verification, or more pragmatically, have a large test suite and hope that 
> you tested everything.

Even when doing modifications to Parser.y, I relied mainly on the test
suite to determine whether my change was right (and the test suite
always caught many issues). A large test suite is the best approach
both for 'happy'-based parsers and for combinator-based parsers.

> and then have a separate pass that validates and fixes up the results

That's where my concern lies. This separate pass is confusing (at
least for me - it's not the most straightforward thing to parse
something incorrectly and then restructure it), it is hard to modify,
it does not handle corner cases (e.g. #1087).

Since we have all this Haskell code that does a significant portion of
processing, why even bother with having a LALR pass before it?

> namely we can report better parse errors

I don't think that's true, we can achieve better error messages with
recursive descent.

> Also, with the new rewrite of HsSyn, we should be able to mark such 
> constructors as only usable in the parsing pass, so later passes wouldn't 
> need to worry about them.

Not completely true, GhcPs-parametrized structures are the final
output of parsing, so at least the renamer will face these
constructors.

On Tue, Oct 9, 2018 at 1:00 AM Iavor Diatchki <iavor.diatc...@gmail.com> wrote:
>
> Hello,
>
> my experience with complex parsers written using parsing combinators is that 
> they tend to be quite difficult to modify and have any kind of assurance that 
> now you haven't broken something else.   While reduce-reduce errors are 
> indeed annoying, you at least know that there is some sort of issue you need 
> to address.   With a combinator based parser, you basically have to do 
> program verification, or more pragmatically, have a large test suite and hope 
> that you tested everything.
>
> I think the current approach is actually quite reasonable:  use the Happy 
> grammar to parse out the basic structure of the program, without trying to be 
> completely precise, and then have a separate pass that validates and fixes up 
> the results.   While this has the draw-back of some constructors being in the 
> "wrong place", there are also benefits---namely we can report better parse 
> errors.  Also, with the new rewrite of HsSyn, we should be able to mark such 
> constructors as only usable in the parsing pass, so later passes wouldn't 
> need to worry about them.
>
> -Iavor
>
>
>
>
>
>
>
>
>
>
>
> On Mon, Oct 8, 2018 at 2:26 PM Simon Peyton Jones via ghc-devs 
> <ghc-devs@haskell.org> wrote:
>>
>> I'm no parser expert, but a parser that was easier to understand and modify, 
>> and was as fast as the current one, sounds good to me.
>>
>> It's a tricky area though; e.g. the layout rule.
>>
>> Worth talking to Simon Marlow.
>>
>> Simon
>>
>>
>>
>> | -----Original Message-----
>> | From: ghc-devs <ghc-devs-boun...@haskell.org> On Behalf Of Vladislav
>> | Zavialov
>> | Sent: 08 October 2018 21:44
>> | To: ghc-devs <ghc-devs@haskell.org>
>> | Subject: Parser.y rewrite with parser combinators
>> |
>> | Hello devs,
>> |
>> | Recently I've been working on a couple of parsing-related issues in
>> | GHC. I implemented support for the -XStarIsType extension, fixed
>> | parsing of the (!) type operator (Trac #15457), allowed using type
>> | operators in existential contexts (Trac #15675).
>> |
>> | Doing these tasks required way more engineering effort than I expected
>> | from my prior experience working with parsers due to complexities of
>> | GHC's grammar.
>> |
>> | In the last couple of days, I've been working on Trac #1087 - a
>> | 12-year old parsing bug. After trying out a couple of approaches, to
>> | my dismay I realised that fixing it properly (including support for
>> | bang patterns inside infix constructors, etc) would require a complete
>> | rewrite of expression and pattern parsing logic.
>> |
>> | Worse yet, most of the work would be done outside Parser.y in Haskell
>> | code instead, in RdrHsSyn helpers. When I try to keep the logic inside
>> | Parser.y, in every design direction I face reduce/reduce conflicts.
>> |
>> | The reduce/reduce conflicts are the worst.
>> |
>> | Perhaps it is finally time to admit that Haskell syntax with all of
>> | the GHC cannot fit into a LALR grammar?
>> |
>> | The extent of hacks that we have right now just to make parsing
>> | possible is astonishing. For instance, we have dedicated constructors
>> | in HsExpr to make parsing patterns possible (EWildPat, EAsPat,
>> | EViewPat, ELazyPat). That is, one of the fundamental types (that the
>> | type checker operates on) has four additional constructors that exist
>> | due to a reduce/reduce conflict between patterns and expressions.
>> |
>> | I propose a complete rewrite of GHC's parser to use recursive descent
>> | parsing with monadic parser combinators.
>> |
>> | 1. We could significantly simplify parsing logic by doing things in a
>> | more direct manner. For instance, instead of parsing patterns as
>> | expressions and then post-processing them, we could have separate
>> | parsing logic for patterns and expressions.
>> |
>> | 2. We could fix long-standing parsing bugs like Trac #1087 because
>> | recursive descent offers more expressive power than LALR (at the cost
>> | of support for left recursion, which is not much of a loss in
>> | practice).
>> |
>> | 3. New extensions to the grammar would require less engineering effort.
>> |
>> | Of course, this rewrite is a huge chunk of work, so before I start, I
>> | would like to know that this work would be accepted if done well.
>> | Here's what I want to achieve:
>> |
>> | * Comparable performance. The new parser could turn out to be faster
>> | because it would do less post-processing, but it could be slower
>> | because 'happy' does all the sorts of low-level optimisations. I will
>> | consider this project a success only if comparable performance is
>> | achieved.
>> |
>> | * Correctness. The new parser should handle 100% of the syntactic
>> | constructs that the current parser can handle.
>> |
>> | * Error messages. The new error messages should be of equal or better
>> | quality than existing ones.
>> |
>> | * Elegance. The new parser should bring simplification to other parts
>> | of the compiler (e.g. removal of pattern constructors from HsExpr).
>> | And one of the design principles is to represent things by dedicated
>> | data structures, in contrast to the current state of affairs where we
>> | represent patterns as expressions, data constructor declarations as
>> | types (before D5180), etc.
>> |
>> | Let me know if this is a good/acceptable direction of travel. That's
>> | definitely something that I personally would like to see happen.
>> |
>> | All the best,
>> | - Vladislav
>> | _______________________________________________
>> | ghc-devs mailing list
>> | ghc-devs@haskell.org
>> | https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.hask
>> | ell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-
>> | devs&amp;data=02%7C01%7Csimonpj%40microsoft.com%7C19181de5c6bd493ab07a08d
>> | 62d5edbe0%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636746282778542095
>> | &amp;sdata=lFRt1t4k3BuuRdyOqwOYTZcLPRB%2BtFJwfFtgMpNLxW0%3D&amp;reserved=
>> | 0
>> _______________________________________________
>> ghc-devs mailing list
>> ghc-devs@haskell.org
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: Parser.y rewrite with parser combinators

Reply via email to