> complex parsers written using parsing combinators is that they tend to be > quite difficult to modify and have any kind of assurance that now you haven't > broken something else
That's true regardless of implementation technique, parsers are rather delicate. A LALR-based parser generator does provide more information when it detects shift/reduce and reduce/reduce conflicts, but I never found this information useful. It was always quite the opposite of being helpful - an indication that a LALR parser could not handle my change and I had to look for workarounds. > With a combinator based parser, you basically have to do program > verification, or more pragmatically, have a large test suite and hope that > you tested everything. Even when doing modifications to Parser.y, I relied mainly on the test suite to determine whether my change was right (and the test suite always caught many issues). A large test suite is the best approach both for 'happy'-based parsers and for combinator-based parsers. > and then have a separate pass that validates and fixes up the results That's where my concern lies. This separate pass is confusing (at least for me - it's not the most straightforward thing to parse something incorrectly and then restructure it), it is hard to modify, it does not handle corner cases (e.g. #1087). Since we have all this Haskell code that does a significant portion of processing, why even bother with having a LALR pass before it? > namely we can report better parse errors I don't think that's true, we can achieve better error messages with recursive descent. > Also, with the new rewrite of HsSyn, we should be able to mark such > constructors as only usable in the parsing pass, so later passes wouldn't > need to worry about them. Not completely true, GhcPs-parametrized structures are the final output of parsing, so at least the renamer will face these constructors. On Tue, Oct 9, 2018 at 1:00 AM Iavor Diatchki <iavor.diatc...@gmail.com> wrote: > > Hello, > > my experience with complex parsers written using parsing combinators is that > they tend to be quite difficult to modify and have any kind of assurance that > now you haven't broken something else. While reduce-reduce errors are > indeed annoying, you at least know that there is some sort of issue you need > to address. With a combinator based parser, you basically have to do > program verification, or more pragmatically, have a large test suite and hope > that you tested everything. > > I think the current approach is actually quite reasonable: use the Happy > grammar to parse out the basic structure of the program, without trying to be > completely precise, and then have a separate pass that validates and fixes up > the results. While this has the draw-back of some constructors being in the > "wrong place", there are also benefits---namely we can report better parse > errors. Also, with the new rewrite of HsSyn, we should be able to mark such > constructors as only usable in the parsing pass, so later passes wouldn't > need to worry about them. > > -Iavor > > > > > > > > > > > > On Mon, Oct 8, 2018 at 2:26 PM Simon Peyton Jones via ghc-devs > <ghc-devs@haskell.org> wrote: >> >> I'm no parser expert, but a parser that was easier to understand and modify, >> and was as fast as the current one, sounds good to me. >> >> It's a tricky area though; e.g. the layout rule. >> >> Worth talking to Simon Marlow. >> >> Simon >> >> >> >> | -----Original Message----- >> | From: ghc-devs <ghc-devs-boun...@haskell.org> On Behalf Of Vladislav >> | Zavialov >> | Sent: 08 October 2018 21:44 >> | To: ghc-devs <ghc-devs@haskell.org> >> | Subject: Parser.y rewrite with parser combinators >> | >> | Hello devs, >> | >> | Recently I've been working on a couple of parsing-related issues in >> | GHC. I implemented support for the -XStarIsType extension, fixed >> | parsing of the (!) type operator (Trac #15457), allowed using type >> | operators in existential contexts (Trac #15675). >> | >> | Doing these tasks required way more engineering effort than I expected >> | from my prior experience working with parsers due to complexities of >> | GHC's grammar. >> | >> | In the last couple of days, I've been working on Trac #1087 - a >> | 12-year old parsing bug. After trying out a couple of approaches, to >> | my dismay I realised that fixing it properly (including support for >> | bang patterns inside infix constructors, etc) would require a complete >> | rewrite of expression and pattern parsing logic. >> | >> | Worse yet, most of the work would be done outside Parser.y in Haskell >> | code instead, in RdrHsSyn helpers. When I try to keep the logic inside >> | Parser.y, in every design direction I face reduce/reduce conflicts. >> | >> | The reduce/reduce conflicts are the worst. >> | >> | Perhaps it is finally time to admit that Haskell syntax with all of >> | the GHC cannot fit into a LALR grammar? >> | >> | The extent of hacks that we have right now just to make parsing >> | possible is astonishing. For instance, we have dedicated constructors >> | in HsExpr to make parsing patterns possible (EWildPat, EAsPat, >> | EViewPat, ELazyPat). That is, one of the fundamental types (that the >> | type checker operates on) has four additional constructors that exist >> | due to a reduce/reduce conflict between patterns and expressions. >> | >> | I propose a complete rewrite of GHC's parser to use recursive descent >> | parsing with monadic parser combinators. >> | >> | 1. We could significantly simplify parsing logic by doing things in a >> | more direct manner. For instance, instead of parsing patterns as >> | expressions and then post-processing them, we could have separate >> | parsing logic for patterns and expressions. >> | >> | 2. We could fix long-standing parsing bugs like Trac #1087 because >> | recursive descent offers more expressive power than LALR (at the cost >> | of support for left recursion, which is not much of a loss in >> | practice). >> | >> | 3. New extensions to the grammar would require less engineering effort. >> | >> | Of course, this rewrite is a huge chunk of work, so before I start, I >> | would like to know that this work would be accepted if done well. >> | Here's what I want to achieve: >> | >> | * Comparable performance. The new parser could turn out to be faster >> | because it would do less post-processing, but it could be slower >> | because 'happy' does all the sorts of low-level optimisations. I will >> | consider this project a success only if comparable performance is >> | achieved. >> | >> | * Correctness. The new parser should handle 100% of the syntactic >> | constructs that the current parser can handle. >> | >> | * Error messages. The new error messages should be of equal or better >> | quality than existing ones. >> | >> | * Elegance. The new parser should bring simplification to other parts >> | of the compiler (e.g. removal of pattern constructors from HsExpr). >> | And one of the design principles is to represent things by dedicated >> | data structures, in contrast to the current state of affairs where we >> | represent patterns as expressions, data constructor declarations as >> | types (before D5180), etc. >> | >> | Let me know if this is a good/acceptable direction of travel. That's >> | definitely something that I personally would like to see happen. >> | >> | All the best, >> | - Vladislav >> | _______________________________________________ >> | ghc-devs mailing list >> | ghc-devs@haskell.org >> | https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.hask >> | ell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc- >> | devs&data=02%7C01%7Csimonpj%40microsoft.com%7C19181de5c6bd493ab07a08d >> | 62d5edbe0%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636746282778542095 >> | &sdata=lFRt1t4k3BuuRdyOqwOYTZcLPRB%2BtFJwfFtgMpNLxW0%3D&reserved= >> | 0 >> _______________________________________________ >> ghc-devs mailing list >> ghc-devs@haskell.org >> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs