https://github.com/Thibault-Monnier created 
https://github.com/llvm/llvm-project/pull/171914

This change attempts to maximize usage of the SSE fast path in 
`fastParseASCIIIdentifier`.

If the binary is compiled with SSE4.2 enabled, then the behavior is the exact 
same, ensuring we have no regressions.

If not, we compile both the SSE fast path and the scalar loop. At runtime, we 
check if SSE4.2 is available by using `__builtin_cpu_supports`, and dispatch to 
the right function. If it _is_ available, this allows a net performance 
improvement. Otherwise, there's a very slight but negligible regression... I 
believe that's perfectly reasonable for a non-SSE4.2-supporting processor.

I checked locally on an old processor with QEMU to ensure this doesn't break 
compatibility; I'll need help to write the real tests though.

The benchmark results are available at 
[llvm-compile-time-tracker](https://llvm-compile-time-tracker.com/compare.php?from=f88d060c4176d17df56587a083944637ca865cb3&to=c2cf8e936d72720eafd32d3b25e85a784112b226&stat=instructions%3Au).

>From 4fc9a07698e1a4627a050ba6fa9df3f1f8725451 Mon Sep 17 00:00:00 2001
From: Thibault-Monnier <[email protected]>
Date: Thu, 11 Dec 2025 22:02:35 +0100
Subject: [PATCH] Detect sse4.2 availability at runtime to use it on modern
 processors

---
 clang/lib/Lex/Lexer.cpp | 35 ++++++++++++++++++++++++++---------
 1 file changed, 26 insertions(+), 9 deletions(-)

diff --git a/clang/lib/Lex/Lexer.cpp b/clang/lib/Lex/Lexer.cpp
index b282a600c0e56..3b8fa0b9b7f36 100644
--- a/clang/lib/Lex/Lexer.cpp
+++ b/clang/lib/Lex/Lexer.cpp
@@ -46,9 +46,7 @@
 #include <string>
 #include <tuple>
 
-#ifdef __SSE4_2__
 #include <nmmintrin.h>
-#endif
 
 using namespace clang;
 
@@ -1921,9 +1919,17 @@ bool Lexer::LexUnicodeIdentifierStart(Token &Result, 
uint32_t C,
 }
 
 static const char *
-fastParseASCIIIdentifier(const char *CurPtr,
-                         [[maybe_unused]] const char *BufferEnd) {
-#ifdef __SSE4_2__
+fastParseASCIIIdentifierScalar(const char *CurPtr,
+                               [[maybe_unused]] const char *BufferEnd) {
+  unsigned char C = *CurPtr;
+  while (isAsciiIdentifierContinue(C))
+    C = *++CurPtr;
+  return CurPtr;
+}
+
+__attribute__((target("sse4.2"))) static const char *
+fastParseASCIIIdentifierSSE42(const char *CurPtr,
+                              [[maybe_unused]] const char *BufferEnd) {
   alignas(16) static constexpr char AsciiIdentifierRange[16] = {
       '_', '_', 'A', 'Z', 'a', 'z', '0', '9',
   };
@@ -1943,12 +1949,23 @@ fastParseASCIIIdentifier(const char *CurPtr,
       continue;
     return CurPtr;
   }
+
+  return fastParseASCIIIdentifierScalar(CurPtr, BufferEnd);
+}
+
+static bool supportsSSE42() {
+  static bool SupportsSSE42 = __builtin_cpu_supports("sse4.2");
+  return SupportsSSE42;
+}
+
+static const char *fastParseASCIIIdentifier(const char *CurPtr,
+                                            const char *BufferEnd) {
+#ifndef __SSE4_2__
+  if (LLVM_UNLIKELY(!supportsSSE42()))
+    return fastParseASCIIIdentifierScalar(CurPtr, BufferEnd);
 #endif
 
-  unsigned char C = *CurPtr;
-  while (isAsciiIdentifierContinue(C))
-    C = *++CurPtr;
-  return CurPtr;
+  return fastParseASCIIIdentifierSSE42(CurPtr, BufferEnd);
 }
 
 bool Lexer::LexIdentifierContinue(Token &Result, const char *CurPtr) {

_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to