Unfortunately, the current parsing rules for Haskell aren't fully
phase-separable like this.

If you look at the rules for Layout token insertion in the Haskell report
the 9th rule requires that in the event the parser encounters a parse
error it should insert a virtual close brace and continue on!

Otherwise you couldn't parse things like *let **{** foo = bar **}** in
baz *where
the {}'s are virtual without reframing *let* and *in* as a different kind
of paired opening and closing brace or using other hacks in the grammar. It
is quite difficult to hack around all the ways parses can go wrong.

The main downside this has from a language standpoint is you simply can't
properly lex Haskell without more or less fully parsing Haskell.

-Edward

On Wed, Aug 18, 2021 at 7:22 AM Kwanghoon Choi <lazysw...@gmail.com> wrote:

>
> Hi,
>
> I have recently been playing with GHC's Lexer.lexer in the ghc-parser-lib
> package.
>
> Given
>
>    module HelloWorld where
>
>    main = putStrLn "Hello World!\n"
>
> it produces
>
>    stack exec -- lexer-exe ./examples/HelloWorld.hs
>    Lexing&Parsing: ./examples/HelloWorld.hs
>    module at (1, 1): module
>    CONID at (1, 8): CONID
>    where at (1, 19): where
>    vocurly at (3, 1): vocurly    <==== { is inserted automatically!!
>    VARID at (3, 1): VARID
>    = at (3, 6): =
>    VARID at (3, 8): VARID
>    STRING at (3, 17): STRING
>    ; at (4, 1): ;
>
> By the example above, the lexer automatically inserts an opening brace
> (i.e. vocurly) right after 'where'. But it does not insert a matching
> closing brace (i.e., vccurly), which would lead to a failure in parsing a
> list of tokens produced by the lexer.
>
> My question is how to use the GHC lexer to produce closing braces as well.
>
> All my code is available
>  - https://github.com/kwanghoon/hslexer
>
> To save your time, the relevant part of the code is as follows:
>
> In app/HaskellLexer.hs,
>
>     singleHaskellToken :: P (Located Token)
>     singleHaskellToken =
>       Lexer.lexer False
>         (\locatedToken -> P (\pstate -> POk pstate locatedToken))
>
>     tokInfos :: [Terminal Token] -> P (Line, Column, [Terminal Token])
>     tokInfos s = do
>       locatedToken <- singleHaskellToken
>       case locatedToken of
>         L srcspan ITeof ->
>           let (start_line, start_col, end_line, end_col) =
> srcSpanToLineCol srcspan in
>           return (end_line, end_col, s)
>
>         L srcspan tok ->
>           let (start_line, start_col, end_line, end_col) =
> srcSpanToLineCol srcspan in
>           tokInfos (Terminal (fromToken tok) start_line start_col (Just
> tok) : s)
>
> Thanks in advance
>
> Best regards,
>
> Kwanghoon
>
>
>
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Reply via email to