http://gcc.gnu.org/bugzilla/show_bug.cgi?id=8270
--- Comment #52 from GoWhoopee at yahoo dot com --- Whitespace is required by Translation Phase 3, consequently Translation Phase 1 should not be changing whitespace at all, only mapping multibyte characters and trigraphs. Comment #39: Indicates that gcc is known to work incorrectly, "This (removal of such spaces) is part of how GCC defines the implementation-defined mapping in translation phase 1.": the removal of white-space is not mapping multibyte characters or trigraphs, it is removing critical information from Translation Phases 2 and 3 resulting in misinterpretation of the source code. Looking at the 4.8.2 source, libcpp\lex.c line 1427, there is a fix when parsing raw strings, after the event: ______________________________________________ static void lex_raw_string (cpp_reader *pfile, cpp_token *token, const uchar *base, const uchar *cur) { [...] switch (note->type) { case '\\': case ' ': /* Restore backslash followed by newline. */ BUF_APPEND (base, cur - base); base = cur; BUF_APPEND ("\\", 1); after_backslash: if (note->type == ' ') { /* GNU backslash whitespace newline extension. FIXME could be any sequence of non-vertical space. When we can properly restore any such sequence, we should mark this note as handled so _cpp_process_line_notes doesn't warn. */ BUF_APPEND (" ", 1); } BUF_APPEND ("\n", 1); break; ______________________________________________ but fixing all the varieties of broken things after the event wouldn't be necessary if Translation Phase 1 didn't trim whitespace. If Translation Phase 1 is required to trim whitespace for some reason (performance, perhaps) then it should trim multiple consecutive spaces down to exactly one space; which wouldn't break Translation Phase 2 and 3. Does that sound like a sensible compromise?