https://github.com/Rinn created https://github.com/llvm/llvm-project/pull/205084
When preprocessing a source file on windows with \r\n newlines while preserving comments any newlines within in a multiline block comment are written directly. When the newlines are later normalized these become \r\r\n which leaves an extra carriage return. I found this when I noticed the PVS Studio static code analyzer was miscounting the line numbers for warnings because it was counting the extra carriage return as an extra newline (I have reported it to them and they're going to fix their tool) but this still seems like incorrect logic for clang. I don't know if this is really the best way to solve this but I basically just duplicated the logic from PrintPPOutputPPCallbacks::HandleNewlinesInToken when printing comment tokens. I've attached a reproducer for this which I modified to write the output .i file [repro-66b710.zip](https://github.com/user-attachments/files/29203984/repro-66b710.zip) >From 25db0f0051754ebc88a1b42438d4d593b486f934 Mon Sep 17 00:00:00 2001 From: Joe Kirchoff <[email protected]> Date: Mon, 22 Jun 2026 12:29:50 +0100 Subject: [PATCH] Fix preprocessed block comment newline on windows When preprocessing a source file on windows with \r\n newlines while preserving comments, any newlines in a block comment are written directly which is then later normalized to \r\r\n. Fix this by only writing out \n in PrintPreprocessedTokens for those comments by copying the logic from PrintPPOutputPPCallbacks::HandleNewlinesInToken --- .../lib/Frontend/PrintPreprocessedOutput.cpp | 30 +++++++++++++++++-- 1 file changed, 28 insertions(+), 2 deletions(-) diff --git a/clang/lib/Frontend/PrintPreprocessedOutput.cpp b/clang/lib/Frontend/PrintPreprocessedOutput.cpp index 02266882c4c4a..16c09d1263e30 100644 --- a/clang/lib/Frontend/PrintPreprocessedOutput.cpp +++ b/clang/lib/Frontend/PrintPreprocessedOutput.cpp @@ -889,6 +889,26 @@ struct UnknownPragmaHandler : public PragmaHandler { }; } // end anonymous namespace +static void PrintPreprocessedComment(raw_ostream *OS, const char *TokStr, + unsigned Len) { + for (; Len; --Len, ++TokStr) { + if (*TokStr != '\n' && + *TokStr != '\r') { + *OS << *TokStr; + continue; + } + + *OS << '\n'; + + // If we have \n\r or \r\n, skip both and emit one newline. + if (Len != 1 && + (TokStr[1] == '\n' || TokStr[1] == '\r') && + TokStr[0] != TokStr[1]) { + ++TokStr; + --Len; + } + } +} static void PrintPreprocessedTokens(Preprocessor &PP, Token &Tok, PrintPPOutputPPCallbacks *Callbacks) { @@ -1022,7 +1042,10 @@ static void PrintPreprocessedTokens(Preprocessor &PP, Token &Tok, } else if (Tok.getLength() < std::size(Buffer)) { const char *TokPtr = Buffer; unsigned Len = PP.getSpelling(Tok, TokPtr); - Callbacks->OS->write(TokPtr, Len); + if (Tok.is(tok::comment)) + PrintPreprocessedComment(Callbacks->OS, TokPtr, Len); + else + Callbacks->OS->write(TokPtr, Len); // Tokens that can contain embedded newlines need to adjust our current // line number. @@ -1039,7 +1062,10 @@ static void PrintPreprocessedTokens(Preprocessor &PP, Token &Tok, } } else { std::string S = PP.getSpelling(Tok); - Callbacks->OS->write(S.data(), S.size()); + if (Tok.is(tok::comment)) + PrintPreprocessedComment(Callbacks->OS, S.data(), S.size()); + else + Callbacks->OS->write(S.data(), S.size()); // Tokens that can contain embedded newlines need to adjust our current // line number. _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
