On Wed, May 6, 2015 at 6:08 AM, Herbert Valerio Riedel <hvrie...@gmail.com> wrote: > Hello *, > > As you may be aware, GHC's `{-# LANGUAGE CPP #-}` language extension > currently relies on the system's C-compiler bundled `cpp` program to > provide a "traditional mode" c-preprocessor. > > This has caused several problems in the past, since parsing Haskell code > with a preprocessor mode designed for use with C's tokenizer has caused > already quite some problems[1] in the past. I'd like to see GHC 7.12 > adopt an implemntation of `-XCPP` that does not rely on the shaky > system-`cpp` foundation. To this end I've created a wiki page > > https://ghc.haskell.org/trac/ghc/wiki/Proposal/NativeCpp > > to describe the actual problems in more detail, and a couple of possible > ways forward. Ideally, we'd simply integrate `cpphs` into GHC > (i.e. "plan 2"). However, due to `cpp`s non-BSD3 license this should be > discussed and debated since affects the overall-license of the GHC > code-base, which may or may not be a problem to GHC's user-base (and > that's what I hope this discussion will help to find out). > > So please go ahead and read the Wiki page... and then speak your mind!
Thanks for writing this up, btw! It's nice to put the mumblings we've had for a while down 'on paper'. > > Thanks, > HVR > > > [1]: ...does anybody remember the issues Haskell packages (& GHC) > encountered when Apple switched to the Clang tool-chain, thereby > causing code using `-XCPP` to suddenly break due to subtly > different `cpp`-semantics? There are two (major) differences I can list, although I can only provide some specific examples OTTOMH: 1) Clang is more strict wrt language specifications. For example, GCC is lenient and allows a space between a macro identifier and the parenthesis denoting a parameter list; so saying 'FOO (x, y)' is valid with GCC (where FOO is a macro), but not with Clang. Sometimes this trips up existing code, but I've mostly seen it in GHC itself. 2) The lexing rules for C and Haskell simply are not the same in general. For example, what should "FOO(a' + b')" parse to? Well, in Haskell, 'prime' is a valid component from an identifier and in this case the parse should be "a prime + b prime", but in C the ' character is identified as beginning the start of a single-character literal, and a strict preprocessor like Clang's will reject that. In practice, I think people have mostly just avoided arcane lexer behaviors that don't work, and the only reason this was never a problem was because GCC or some variant was always the 'standard' C compiler GHC could rely on. > _______________________________________________ > Glasgow-haskell-users mailing list > Glasgow-haskell-users@haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users > -- Regards, Austin Seipp, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/ _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users