https://github.com/Rinn created https://github.com/llvm/llvm-project/pull/205084

When preprocessing a source file on windows with \r\n newlines while preserving 
comments any newlines within in a multiline block comment are written directly. 
When the newlines are later normalized these become \r\r\n which leaves an 
extra carriage return. I found this when I noticed the PVS Studio static code 
analyzer was miscounting the line numbers for warnings because it was counting 
the extra carriage return as an extra newline (I have reported it to them and 
they're going to fix their tool) but this still seems like incorrect logic for 
clang.

I don't know if this is really the best way to solve this but I basically just 
duplicated the logic from PrintPPOutputPPCallbacks::HandleNewlinesInToken when 
printing comment tokens.

I've attached a reproducer for this which I modified to write the output .i file
[repro-66b710.zip](https://github.com/user-attachments/files/29203984/repro-66b710.zip)


>From 25db0f0051754ebc88a1b42438d4d593b486f934 Mon Sep 17 00:00:00 2001
From: Joe Kirchoff <[email protected]>
Date: Mon, 22 Jun 2026 12:29:50 +0100
Subject: [PATCH] Fix preprocessed block comment newline on windows

When preprocessing a source file on windows with \r\n newlines while preserving 
comments, any newlines in a block comment are written directly which is then 
later normalized to \r\r\n. Fix this by only writing out \n in 
PrintPreprocessedTokens for those comments by copying the logic from 
PrintPPOutputPPCallbacks::HandleNewlinesInToken
---
 .../lib/Frontend/PrintPreprocessedOutput.cpp  | 30 +++++++++++++++++--
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/clang/lib/Frontend/PrintPreprocessedOutput.cpp 
b/clang/lib/Frontend/PrintPreprocessedOutput.cpp
index 02266882c4c4a..16c09d1263e30 100644
--- a/clang/lib/Frontend/PrintPreprocessedOutput.cpp
+++ b/clang/lib/Frontend/PrintPreprocessedOutput.cpp
@@ -889,6 +889,26 @@ struct UnknownPragmaHandler : public PragmaHandler {
 };
 } // end anonymous namespace
 
+static void PrintPreprocessedComment(raw_ostream *OS, const char *TokStr,
+                                     unsigned Len) {
+  for (; Len; --Len, ++TokStr) {
+    if (*TokStr != '\n' &&
+        *TokStr != '\r') {
+      *OS << *TokStr;
+      continue;
+    }
+
+    *OS << '\n';
+
+    // If we have \n\r or \r\n, skip both and emit one newline.
+    if (Len != 1 &&
+        (TokStr[1] == '\n' || TokStr[1] == '\r') &&
+        TokStr[0] != TokStr[1]) {
+      ++TokStr;
+      --Len;
+    }
+  }
+}
 
 static void PrintPreprocessedTokens(Preprocessor &PP, Token &Tok,
                                     PrintPPOutputPPCallbacks *Callbacks) {
@@ -1022,7 +1042,10 @@ static void PrintPreprocessedTokens(Preprocessor &PP, 
Token &Tok,
     } else if (Tok.getLength() < std::size(Buffer)) {
       const char *TokPtr = Buffer;
       unsigned Len = PP.getSpelling(Tok, TokPtr);
-      Callbacks->OS->write(TokPtr, Len);
+      if (Tok.is(tok::comment))
+        PrintPreprocessedComment(Callbacks->OS, TokPtr, Len);
+      else
+        Callbacks->OS->write(TokPtr, Len);
 
       // Tokens that can contain embedded newlines need to adjust our current
       // line number.
@@ -1039,7 +1062,10 @@ static void PrintPreprocessedTokens(Preprocessor &PP, 
Token &Tok,
       }
     } else {
       std::string S = PP.getSpelling(Tok);
-      Callbacks->OS->write(S.data(), S.size());
+      if (Tok.is(tok::comment))
+        PrintPreprocessedComment(Callbacks->OS, S.data(), S.size());
+      else
+        Callbacks->OS->write(S.data(), S.size());
 
       // Tokens that can contain embedded newlines need to adjust our current
       // line number.

_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to