nickdesaulniers created this revision.
nickdesaulniers added reviewers: rsmith, bkramer.
Herald added subscribers: cfe-commits, jfb, kristof.beyls.
Herald added a project: clang.

A seemingly innocuous Linux kernel change [0] seemingly blew up our
compile times by over 3x, as reported by @nathanchance in [1].

The code in question uses a doubly nested macro containing GNU C
statement expressions that are then passed to typeof(), which is then
used in a very important macro for atomic variable access throughout
most of the kernel. The inner most macro, is passed a GNU C statement
expression.  In this case, we have macro arguments that are GNU C
statement expressions, which can contain a significant number of tokens.
The upstream kernel patch caused significant build time regressions for
both Clang and GCC. Since then, some of the nesting has been removed via
@melver, which helps gain back most of the lost compilation time. [2]

Profiles collected [3] from compilations of the slowest TU for us in the
kernel show:

- 51.4% time spent in clang::TokenLexer::updateLocForMacroArgTokens
- 48.7% time spent in clang::SourceManager::getFileIDLocal
- 35.5% time spent in clang::SourceManager::isOffsetInFileID

(mostly calls from the former through to the latter).

So it seems we have a pathological case for which properly tracking the
SourceLocation of macro arguments is significantly harming build
performance. This stands out in referenced flame graph.

In fact, this case was identified previously as being problematic in
commit 3339c568c4 ("[Lex] Speed up updateConsecutiveMacroArgTokens (NFC)")

Looking at the above call chain, there's 3 things we can do to speed up
this case.

1. TokenLexer::updateConsecutiveMacroArgTokens() calls 
SourceManager::isWrittenInSameFile() which calls SourceManager::getFileID(), 
which is both very hot and very expensive to call. SourceManger has a one entry 
cache, member LastFileIDLookup. If that isn't the FileID for a give source 
location offset, we fall back to a linear probe, and then to a binary search 
for the FileID. These fallbacks update the one entry cache, but noticeably they 
do not for the case of macro expansions!

  For the slowest TU to compile in the Linux kernel, it seems that we miss 
about 78.67% of the 68 million queries we make to getFileIDLocal that we could 
have had cache hits for, had we saved the macro expansion source location's 
FileID in the one entry cache. [4]

  I tried adding a separate cache item for macro expansions, and to check that 
before the linear then binary search fallbacks, but did not find it faster than 
simply allowing macro expansions into the one item cache.  This alone nets us 
back a lot of the performance loss.

  That said, this is a modification of caching logic, which is playing with a 
double edged sword.  While it significantly improves the pathological case, its 
hard to say that there's not an equal but opposite pathological case that isn't 
regressed by this change. Though non-pathological cases of builds of the Linux 
kernel before [0] are only slightly improved (<1%) and builds of LLVM itself 
don't change due to this patch.

  Should future travelers find this change to significantly harm their build 
times, I encourage them to feel empowered to revert this change.

2. SourceManager::getFileIDLocal has a FIXME hinting that the call to 
SourceManager::isOffsetInFileID could be made much faster since 
isOffsetInFileID is generic in the sense that it tries to handle the more 
generic case of "local" (as opposed to "loaded") files, though the caller has 
already determined the file to be local. This patch implements a new method 
that specialized for use when the caller already knows the file is local, then 
use that in TokenLexer::updateLocForMacroArgTokens.  This should be less 
controversial than 1, and is likely an across the board win. It's much less 
significant for the pathological case, but still a measurable win once we have 
fallen to the final case of binary search.

3. A bunch of methods in SourceManager take a default argument. 
SourceManager::getLocalSLocEntry doesn't do anything with this argument, yet 
many callers of getLocalSLocEntry setup, pass, then check this argument. This 
is wasted work. We can't remove the parameter outright, as it's part of an 
interface expected by getInclusions() in libclang.

With this patch applied, the above profile [5] for the same pathological
input looks like:

- 25.1% time spent in clang::TokenLexer::updateLocForMacroArgTokens
- 17.2% time spent in clang::SourceManager::getFileIDLocal

and clang::SourceManager::isOffsetInFileID is no longer called, and thus
falls out of the profile.

There may be further improvements to the general problem of "what
interval contains one number out of millions" than the current use of a
one item cache, followed by linear probing, followed by binary
searching. We might even be able to do something smarter in
TokenLexer::updateLocForMacroArgTokens.

[0] 
https://github.com/ClangBuiltLinux/linux/commit/cdd28ad2d8110099e43527e96d059c5639809680
[1] https://github.com/ClangBuiltLinux/linux/issues/1032
[2] 
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=locking/kcsan&id=a5dead405f6be1fb80555bdcb77c406bf133fdc8
[3] https://github.com/ClangBuiltLinux/linux/issues/1032#issuecomment-633712667
[4] https://github.com/ClangBuiltLinux/linux/issues/1032#issuecomment-633741923
[5] https://github.com/ClangBuiltLinux/linux/issues/1032#issuecomment-634932736


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D80681

Files:
  clang/include/clang/Basic/SourceManager.h
  clang/lib/Basic/SourceManager.cpp

Index: clang/lib/Basic/SourceManager.cpp
===================================================================
--- clang/lib/Basic/SourceManager.cpp
+++ clang/lib/Basic/SourceManager.cpp
@@ -861,11 +861,8 @@
     --I;
     if (I->getOffset() <= SLocOffset) {
       FileID Res = FileID::get(int(I - LocalSLocEntryTable.begin()));
-
-      // If this isn't an expansion, remember it.  We have good locality across
-      // FileID lookups.
-      if (!I->isExpansion())
-        LastFileIDLookup = Res;
+      // Remember it.  We have good locality across FileID lookups.
+      LastFileIDLookup = Res;
       NumLinearScans += NumProbes+1;
       return Res;
     }
@@ -882,11 +879,8 @@
   unsigned LessIndex = 0;
   NumProbes = 0;
   while (true) {
-    bool Invalid = false;
     unsigned MiddleIndex = (GreaterIndex-LessIndex)/2+LessIndex;
-    unsigned MidOffset = getLocalSLocEntry(MiddleIndex, &Invalid).getOffset();
-    if (Invalid)
-      return FileID::get(0);
+    unsigned MidOffset = getLocalSLocEntry(MiddleIndex).getOffset();
 
     ++NumProbes;
 
@@ -898,15 +892,10 @@
     }
 
     // If the middle index contains the value, succeed and return.
-    // FIXME: This could be made faster by using a function that's aware of
-    // being in the local area.
-    if (isOffsetInFileID(FileID::get(MiddleIndex), SLocOffset)) {
-      FileID Res = FileID::get(MiddleIndex);
-
-      // If this isn't a macro expansion, remember it.  We have good locality
-      // across FileID lookups.
-      if (!LocalSLocEntryTable[MiddleIndex].isExpansion())
-        LastFileIDLookup = Res;
+    FileID Res = FileID::get(MiddleIndex);
+    if (isOffsetInLocalFileID(Res, SLocOffset)) {
+      // Remember it.  We have good locality across FileID lookups.
+      LastFileIDLookup = Res;
       NumBinaryProbes += NumProbes;
       return Res;
     }
@@ -944,9 +933,7 @@
     const SrcMgr::SLocEntry &E = getLoadedSLocEntry(I);
     if (E.getOffset() <= SLocOffset) {
       FileID Res = FileID::get(-int(I) - 2);
-
-      if (!E.isExpansion())
-        LastFileIDLookup = Res;
+      LastFileIDLookup = Res;
       NumLinearScans += NumProbes + 1;
       return Res;
     }
@@ -979,8 +966,7 @@
 
     if (isOffsetInFileID(FileID::get(-int(MiddleIndex) - 2), SLocOffset)) {
       FileID Res = FileID::get(-int(MiddleIndex) - 2);
-      if (!E.isExpansion())
-        LastFileIDLookup = Res;
+      LastFileIDLookup = Res;
       NumBinaryProbes += NumProbes;
       return Res;
     }
@@ -1695,10 +1681,7 @@
   // The location we're looking for isn't in the main file; look
   // through all of the local source locations.
   for (unsigned I = 0, N = local_sloc_entry_size(); I != N; ++I) {
-    bool Invalid = false;
-    const SLocEntry &SLoc = getLocalSLocEntry(I, &Invalid);
-    if (Invalid)
-      return FileID();
+    const SLocEntry &SLoc = getLocalSLocEntry(I);
 
     if (SLoc.isFile() && SLoc.getFile().getContentCache() &&
         SLoc.getFile().getContentCache()->OrigEntry == SourceFile)
Index: clang/include/clang/Basic/SourceManager.h
===================================================================
--- clang/include/clang/Basic/SourceManager.h
+++ clang/include/clang/Basic/SourceManager.h
@@ -1744,7 +1744,7 @@
     assert(ID != -1 && "Using FileID sentinel value");
     if (ID < 0)
       return getLoadedSLocEntryByID(ID, Invalid);
-    return getLocalSLocEntry(static_cast<unsigned>(ID), Invalid);
+    return getLocalSLocEntry(static_cast<unsigned>(ID));
   }
 
   const SrcMgr::SLocEntry &
@@ -1779,6 +1779,21 @@
     return SLocOffset < getSLocEntryByID(FID.ID+1).getOffset();
   }
 
+  /// A small optimization over isOffsetInFileID for when the caller has
+  /// already determined the FileID to be local, as opposed to loaded and the
+  /// SlocOffset to be before the NextLocalOffset. The more generic
+  /// isOffsetInFileID method should be preferred.
+  inline bool isOffsetInLocalFileID(FileID FID, unsigned SLocOffset) const {
+    assert(SLocOffset < NextLocalOffset && "SLocOffset non-local");
+    assert(FID.ID > 0 && "FID non-local or invalid");
+    const unsigned ID = static_cast<unsigned>(FID.ID);
+    if (SLocOffset < getLocalSLocEntry(ID).getOffset())
+      return false;
+    if (ID + 1 == LocalSLocEntryTable.size())
+      return true;
+    return SLocOffset < getLocalSLocEntry(ID + 1).getOffset();
+  }
+
   /// Returns the previous in-order FileID or an invalid FileID if there
   /// is no previous one.
   FileID getPreviousFileID(FileID FID) const;
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to