Re: Parser.y rewrite with parser combinators

Vanessa McHale Mon, 08 Oct 2018 20:47:29 -0700

I actually have some experience in this department, having authored both
madlang <http://hackage.haskell.org/package/madlang> and language-ats
<http://hackage.haskell.org/package/language-ats>. Parsers using
combinators alone are more brittle than parsers using Happy, at least
for human-facing languages.


I'm also not sure what exactly parser combinators provide over Happy. It
has macros that can emulate e.g. between, many. Drawing up a minimal
example might be a good idea.


On 10/08/2018 05:24 PM, Vladislav Zavialov wrote:
>> complex parsers written using parsing combinators is that they tend to be 
>> quite difficult to modify and have any kind of assurance that now you 
>> haven't broken something else
> That's true regardless of implementation technique, parsers are rather
> delicate. A LALR-based parser generator does provide more information
> when it detects shift/reduce and reduce/reduce conflicts, but I never
> found this information useful. It was always quite the opposite of
> being helpful - an indication that a LALR parser could not handle my
> change and I had to look for workarounds.
>
>> With a combinator based parser, you basically have to do program 
>> verification, or more pragmatically, have a large test suite and hope that 
>> you tested everything.
> Even when doing modifications to Parser.y, I relied mainly on the test
> suite to determine whether my change was right (and the test suite
> always caught many issues). A large test suite is the best approach
> both for 'happy'-based parsers and for combinator-based parsers.
>
>> and then have a separate pass that validates and fixes up the results
> That's where my concern lies. This separate pass is confusing (at
> least for me - it's not the most straightforward thing to parse
> something incorrectly and then restructure it), it is hard to modify,
> it does not handle corner cases (e.g. #1087).
>
> Since we have all this Haskell code that does a significant portion of
> processing, why even bother with having a LALR pass before it?
>
>> namely we can report better parse errors
> I don't think that's true, we can achieve better error messages with
> recursive descent.
>
>> Also, with the new rewrite of HsSyn, we should be able to mark such 
>> constructors as only usable in the parsing pass, so later passes wouldn't 
>> need to worry about them.
> Not completely true, GhcPs-parametrized structures are the final
> output of parsing, so at least the renamer will face these
> constructors.
>
> On Tue, Oct 9, 2018 at 1:00 AM Iavor Diatchki <iavor.diatc...@gmail.com> 
> wrote:
>> Hello,
>>
>> my experience with complex parsers written using parsing combinators is that 
>> they tend to be quite difficult to modify and have any kind of assurance 
>> that now you haven't broken something else.   While reduce-reduce errors are 
>> indeed annoying, you at least know that there is some sort of issue you need 
>> to address.   With a combinator based parser, you basically have to do 
>> program verification, or more pragmatically, have a large test suite and 
>> hope that you tested everything.
>>
>> I think the current approach is actually quite reasonable:  use the Happy 
>> grammar to parse out the basic structure of the program, without trying to 
>> be completely precise, and then have a separate pass that validates and 
>> fixes up the results.   While this has the draw-back of some constructors 
>> being in the "wrong place", there are also benefits---namely we can report 
>> better parse errors.  Also, with the new rewrite of HsSyn, we should be able 
>> to mark such constructors as only usable in the parsing pass, so later 
>> passes wouldn't need to worry about them.
>>
>> -Iavor
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Oct 8, 2018 at 2:26 PM Simon Peyton Jones via ghc-devs 
>> <ghc-devs@haskell.org> wrote:
>>> I'm no parser expert, but a parser that was easier to understand and 
>>> modify, and was as fast as the current one, sounds good to me.
>>>
>>> It's a tricky area though; e.g. the layout rule.
>>>
>>> Worth talking to Simon Marlow.
>>>
>>> Simon
>>>
>>>
>>>
>>> | -----Original Message-----
>>> | From: ghc-devs <ghc-devs-boun...@haskell.org> On Behalf Of Vladislav
>>> | Zavialov
>>> | Sent: 08 October 2018 21:44
>>> | To: ghc-devs <ghc-devs@haskell.org>
>>> | Subject: Parser.y rewrite with parser combinators
>>> |
>>> | Hello devs,
>>> |
>>> | Recently I've been working on a couple of parsing-related issues in
>>> | GHC. I implemented support for the -XStarIsType extension, fixed
>>> | parsing of the (!) type operator (Trac #15457), allowed using type
>>> | operators in existential contexts (Trac #15675).
>>> |
>>> | Doing these tasks required way more engineering effort than I expected
>>> | from my prior experience working with parsers due to complexities of
>>> | GHC's grammar.
>>> |
>>> | In the last couple of days, I've been working on Trac #1087 - a
>>> | 12-year old parsing bug. After trying out a couple of approaches, to
>>> | my dismay I realised that fixing it properly (including support for
>>> | bang patterns inside infix constructors, etc) would require a complete
>>> | rewrite of expression and pattern parsing logic.
>>> |
>>> | Worse yet, most of the work would be done outside Parser.y in Haskell
>>> | code instead, in RdrHsSyn helpers. When I try to keep the logic inside
>>> | Parser.y, in every design direction I face reduce/reduce conflicts.
>>> |
>>> | The reduce/reduce conflicts are the worst.
>>> |
>>> | Perhaps it is finally time to admit that Haskell syntax with all of
>>> | the GHC cannot fit into a LALR grammar?
>>> |
>>> | The extent of hacks that we have right now just to make parsing
>>> | possible is astonishing. For instance, we have dedicated constructors
>>> | in HsExpr to make parsing patterns possible (EWildPat, EAsPat,
>>> | EViewPat, ELazyPat). That is, one of the fundamental types (that the
>>> | type checker operates on) has four additional constructors that exist
>>> | due to a reduce/reduce conflict between patterns and expressions.
>>> |
>>> | I propose a complete rewrite of GHC's parser to use recursive descent
>>> | parsing with monadic parser combinators.
>>> |
>>> | 1. We could significantly simplify parsing logic by doing things in a
>>> | more direct manner. For instance, instead of parsing patterns as
>>> | expressions and then post-processing them, we could have separate
>>> | parsing logic for patterns and expressions.
>>> |
>>> | 2. We could fix long-standing parsing bugs like Trac #1087 because
>>> | recursive descent offers more expressive power than LALR (at the cost
>>> | of support for left recursion, which is not much of a loss in
>>> | practice).
>>> |
>>> | 3. New extensions to the grammar would require less engineering effort.
>>> |
>>> | Of course, this rewrite is a huge chunk of work, so before I start, I
>>> | would like to know that this work would be accepted if done well.
>>> | Here's what I want to achieve:
>>> |
>>> | * Comparable performance. The new parser could turn out to be faster
>>> | because it would do less post-processing, but it could be slower
>>> | because 'happy' does all the sorts of low-level optimisations. I will
>>> | consider this project a success only if comparable performance is
>>> | achieved.
>>> |
>>> | * Correctness. The new parser should handle 100% of the syntactic
>>> | constructs that the current parser can handle.
>>> |
>>> | * Error messages. The new error messages should be of equal or better
>>> | quality than existing ones.
>>> |
>>> | * Elegance. The new parser should bring simplification to other parts
>>> | of the compiler (e.g. removal of pattern constructors from HsExpr).
>>> | And one of the design principles is to represent things by dedicated
>>> | data structures, in contrast to the current state of affairs where we
>>> | represent patterns as expressions, data constructor declarations as
>>> | types (before D5180), etc.
>>> |
>>> | Let me know if this is a good/acceptable direction of travel. That's
>>> | definitely something that I personally would like to see happen.
>>> |
>>> | All the best,
>>> | - Vladislav
>>> | _______________________________________________
>>> | ghc-devs mailing list
>>> | ghc-devs@haskell.org
>>> | https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.hask
>>> | ell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-
>>> | devs&amp;data=02%7C01%7Csimonpj%40microsoft.com%7C19181de5c6bd493ab07a08d
>>> | 62d5edbe0%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636746282778542095
>>> | &amp;sdata=lFRt1t4k3BuuRdyOqwOYTZcLPRB%2BtFJwfFtgMpNLxW0%3D&amp;reserved=
>>> | 0
>>> _______________________________________________
>>> ghc-devs mailing list
>>> ghc-devs@haskell.org
>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
> _______________________________________________
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

signature.asc
Description: OpenPGP digital signature

_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: Parser.y rewrite with parser combinators

Reply via email to