Unfortunately, the current parsing rules for Haskell aren't fully phase-separable like this.
If you look at the rules for Layout token insertion in the Haskell report the 9th rule requires that in the event the parser encounters a parse error it should insert a virtual close brace and continue on! Otherwise you couldn't parse things like *let **{** foo = bar **}** in baz *where the {}'s are virtual without reframing *let* and *in* as a different kind of paired opening and closing brace or using other hacks in the grammar. It is quite difficult to hack around all the ways parses can go wrong. The main downside this has from a language standpoint is you simply can't properly lex Haskell without more or less fully parsing Haskell. -Edward On Wed, Aug 18, 2021 at 7:22 AM Kwanghoon Choi <lazysw...@gmail.com> wrote: > > Hi, > > I have recently been playing with GHC's Lexer.lexer in the ghc-parser-lib > package. > > Given > > module HelloWorld where > > main = putStrLn "Hello World!\n" > > it produces > > stack exec -- lexer-exe ./examples/HelloWorld.hs > Lexing&Parsing: ./examples/HelloWorld.hs > module at (1, 1): module > CONID at (1, 8): CONID > where at (1, 19): where > vocurly at (3, 1): vocurly <==== { is inserted automatically!! > VARID at (3, 1): VARID > = at (3, 6): = > VARID at (3, 8): VARID > STRING at (3, 17): STRING > ; at (4, 1): ; > > By the example above, the lexer automatically inserts an opening brace > (i.e. vocurly) right after 'where'. But it does not insert a matching > closing brace (i.e., vccurly), which would lead to a failure in parsing a > list of tokens produced by the lexer. > > My question is how to use the GHC lexer to produce closing braces as well. > > All my code is available > - https://github.com/kwanghoon/hslexer > > To save your time, the relevant part of the code is as follows: > > In app/HaskellLexer.hs, > > singleHaskellToken :: P (Located Token) > singleHaskellToken = > Lexer.lexer False > (\locatedToken -> P (\pstate -> POk pstate locatedToken)) > > tokInfos :: [Terminal Token] -> P (Line, Column, [Terminal Token]) > tokInfos s = do > locatedToken <- singleHaskellToken > case locatedToken of > L srcspan ITeof -> > let (start_line, start_col, end_line, end_col) = > srcSpanToLineCol srcspan in > return (end_line, end_col, s) > > L srcspan tok -> > let (start_line, start_col, end_line, end_col) = > srcSpanToLineCol srcspan in > tokInfos (Terminal (fromToken tok) start_line start_col (Just > tok) : s) > > Thanks in advance > > Best regards, > > Kwanghoon > > > > > _______________________________________________ > ghc-devs mailing list > ghc-devs@haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs