https://github.com/likewhatevs created https://github.com/llvm/llvm-project/pull/180402
Hi, awesome project, thank you! I dabble with kernel stuff and no LSP works well there. This PR aims to improve on that by adding better support for a (arguably too) common pattern in kernel stuff, the unity build pattern. The tl;dr of this build pattern is "include one c file in another c file because reasons (make builds faster, i think)". It makes clangd less effective because it leads to there being no entries in the compilation database for included c files. This PR works around that by using using already-present heuristics to propagate parent files's include's such that things resolve in c files included in c files correctly. I tried to do this previously in the script where we generate a compilation database from build files (here: https://lore.kernel.org/all/[email protected]/), and while that did make folks's ide's happy, that ultimately ended up not working (here: https://lkml.org/lkml/2025/12/17/1294) because it broke some automation somewhere, so I am trying to make a similar-but-more-correct version of that patch here. I tested this PR two ways, via a script which counted errors while building files in a subdir of the linux kernel and (ofc) in vim. ### The script test showed a 70% reduction in errors for files following the unity build pattern (0's excluded from results): #### system installed clangd: ``` ❯ python ~/clangd_errors.py /usr/bin/clangd --dir kernel # kernel: 35 unity build targets 58 kernel/locking/rwbase_rt.c 9 kernel/sched/core_sched.c 4 kernel/sched/deadline.c 11 kernel/sched/debug.c 1319 kernel/sched/ext.c 444 kernel/sched/ext_idle.c 1 kernel/sched/idle.c 1 kernel/sched/loadavg.c 389 kernel/sched/psi.c 1 kernel/sched/rt.c 169 kernel/trace/trace_selftest.c files 35 2406 total errors 2480 total diagnostics (errors + warnings) ``` #### pr clangd: ``` ❯ python ~/clangd_errors.py ~/repos/llvm-project/build/bin/clangd --dir kernel # kernel: 35 unity build targets 58 kernel/locking/rwbase_rt.c 9 kernel/sched/core_sched.c 11 kernel/sched/debug.c 1 kernel/sched/loadavg.c 389 kernel/sched/psi.c 169 kernel/trace/trace_selftest.c files 35 637 total errors 732 total diagnostics (errors + warnings) ``` ### The vim test (the best test, my happy developer with go-to-definition, greyed-out-disabled-code and tab-auto-complete test) showed the following: #### system installed clangd: <img width="2880" height="1800" alt="Screenshot From 2026-02-08 04-50-48" src="https://github.com/user-attachments/assets/d203ac3d-5c91-428a-bcb3-721dc6fa9cef" /> #### pr clangd: <img width="2880" height="1800" alt="Screenshot From 2026-02-08 04-06-22" src="https://github.com/user-attachments/assets/798b6625-e18b-407f-a2fb-15e7de8bc25e" /> Please LMK what you think, thanks! >From 808630c426710ed1c945c0f8ed7a4dd4e2c69218 Mon Sep 17 00:00:00 2001 From: Pat Somaru <[email protected]> Date: Sat, 7 Feb 2026 21:27:09 -0500 Subject: [PATCH] [clangd] Inject preceding includes for unity build targets When a .c file is #include'd by another .c file (unity builds), inject the preceding includes from the proxy as -include flags so the target gets the declarations it depends on. After the existing proxy compile command resolution (which handles both warm-cache via HeaderIncluderCache and cold-start via interpolated CDB commands), check the Heuristic field set by transferCompileCommand / InterpolatingCompilationDatabase. If the command was inferred from a non-header file for a non-header target, preprocess the proxy file to collect resolved #include paths preceding the target and inject them. The preprocessing follows the same pattern as scanPreamble() in Preamble.cpp but uses real VFS instead of an empty one, so that command-line includes (e.g. -include kconfig.h) are processed and to avoid re-scanning on every update. Signed-off-by: Pat Somaru <[email protected]> --- clang-tools-extra/clangd/TUScheduler.cpp | 131 +++++++++++++++++ .../clangd/unittests/TUSchedulerTests.cpp | 139 ++++++++++++++++++ 2 files changed, 270 insertions(+) diff --git a/clang-tools-extra/clangd/TUScheduler.cpp b/clang-tools-extra/clangd/TUScheduler.cpp index 0661ecb58008e..20864e8e125f4 100644 --- a/clang-tools-extra/clangd/TUScheduler.cpp +++ b/clang-tools-extra/clangd/TUScheduler.cpp @@ -54,6 +54,7 @@ #include "GlobalCompilationDatabase.h" #include "ParsedAST.h" #include "Preamble.h" +#include "SourceCode.h" #include "clang-include-cleaner/Record.h" #include "support/Cancellation.h" #include "support/Context.h" @@ -65,6 +66,7 @@ #include "support/Trace.h" #include "clang/Basic/Stack.h" #include "clang/Frontend/CompilerInvocation.h" +#include "clang/Frontend/FrontendActions.h" #include "clang/Tooling/CompilationDatabase.h" #include "llvm/ADT/FunctionExtras.h" #include "llvm/ADT/STLExtras.h" @@ -349,6 +351,80 @@ bool isReliable(const tooling::CompileCommand &Cmd) { return Cmd.Heuristic.empty(); } +/// Run the preprocessor over a proxy file to collect resolved paths of +/// #include directives that precede TargetFile. Uses real VFS for correct +/// #ifdef evaluation and include path resolution. +/// +/// This follows the same pattern as scanPreamble() in Preamble.cpp but uses +/// real VFS instead of an empty one, so that command-line includes (e.g. +/// -include kconfig.h) are processed and #ifdef guards evaluate correctly. +std::vector<std::string> +scanPrecedingIncludes(const ThreadsafeFS &TFS, + const tooling::CompileCommand &ProxyCmd, + PathRef TargetFile) { + // Resolve the proxy file's absolute path and read its contents. + llvm::SmallString<256> ProxyFile(ProxyCmd.Filename); + if (!llvm::sys::path::is_absolute(ProxyFile)) { + ProxyFile = ProxyCmd.Directory; + llvm::sys::path::append(ProxyFile, ProxyCmd.Filename); + } + llvm::sys::path::remove_dots(ProxyFile, /*remove_dot_dot=*/true); + + auto FS = TFS.view(std::nullopt); + + // Canonicalize TargetFile to match Inc.Resolved (which uses canonical + // paths from getCanonicalPath). This ensures symlinks don't cause + // mismatches. + llvm::SmallString<256> CanonTarget; + if (FS->getRealPath(TargetFile, CanonTarget)) + CanonTarget = TargetFile; // Fallback if canonicalization fails. + llvm::sys::path::remove_dots(CanonTarget, /*remove_dot_dot=*/true); + + auto Buf = FS->getBufferForFile(ProxyFile); + if (!Buf) + return {}; + + // Build a compiler invocation from the proxy's compile command. + ParseInputs PI; + PI.Contents = (*Buf)->getBuffer().str(); + PI.TFS = &TFS; + PI.CompileCommand = ProxyCmd; + IgnoringDiagConsumer IgnoreDiags; + auto CI = buildCompilerInvocation(PI, IgnoreDiags); + if (!CI) + return {}; + CI->getDiagnosticOpts().IgnoreWarnings = true; + + auto ContentsBuffer = llvm::MemoryBuffer::getMemBuffer(PI.Contents); + auto Clang = prepareCompilerInstance( + std::move(CI), /*Preamble=*/nullptr, std::move(ContentsBuffer), + TFS.view(std::nullopt), IgnoreDiags); + if (!Clang || Clang->getFrontendOpts().Inputs.empty()) + return {}; + + // Run preprocessor and collect main-file includes via IncludeStructure. + PreprocessOnlyAction Action; + if (!Action.BeginSourceFile(*Clang, Clang->getFrontendOpts().Inputs[0])) + return {}; + IncludeStructure Includes; + Includes.collect(*Clang); + if (llvm::Error Err = Action.Execute()) { + llvm::consumeError(std::move(Err)); + return {}; + } + Action.EndSourceFile(); + + // Return resolved paths of includes that precede TargetFile. + std::vector<std::string> Result; + for (const auto &Inc : Includes.MainFileIncludes) { + if (Inc.Resolved == CanonTarget) + return Result; + if (!Inc.Resolved.empty()) + Result.push_back(Inc.Resolved); + } + return {}; // TargetFile not found among the proxy's includes. +} + /// Threadsafe manager for updating a TUStatus and emitting it after each /// update. class SynchronizedTUStatus { @@ -765,6 +841,12 @@ class ASTWorker { std::atomic<unsigned> ASTBuildCount = {0}; std::atomic<unsigned> PreambleBuildCount = {0}; + /// Cached result of scanPrecedingIncludes for unity build support. + /// Only accessed from the worker thread, no locking needed. + std::string CachedProxyPath; + llvm::sys::TimePoint<> CachedProxyMTime; + std::vector<std::string> CachedPrecedingIncludes; + SynchronizedTUStatus Status; PreambleThread PreamblePeer; }; @@ -885,6 +967,55 @@ void ASTWorker::update(ParseInputs Inputs, WantDiagnostics WantDiags, } } } + // Unity builds: when a non-header .c file has a command inferred from + // another non-header .c file (e.g. ext.c inferred from build_policy.c), + // preprocess the proxy to find preceding #include directives and inject + // them as -include flags. This works for both warm-cache (proxy from + // HeaderIncluderCache, command via transferCompileCommand) and cold-start + // (interpolated command from CDB), since both set the Heuristic field. + if (Cmd && !isHeaderFile(FileName)) { + llvm::StringRef Heuristic = Cmd->Heuristic; + if (Heuristic.consume_front("inferred from ") && + !isHeaderFile(Heuristic)) { + // Resolve proxy path (Heuristic may be a relative filename). + llvm::SmallString<256> ProxyPath(Heuristic); + if (!llvm::sys::path::is_absolute(ProxyPath)) { + ProxyPath = Cmd->Directory; + llvm::sys::path::append(ProxyPath, Heuristic); + } + llvm::sys::path::remove_dots(ProxyPath, /*remove_dot_dot=*/true); + // Use cached result if the proxy hasn't changed, otherwise + // fetch the proxy's compile command and preprocess it. + auto ProxyStat = Inputs.TFS->view(std::nullopt) + ->status(ProxyPath); + auto ProxyMTime = ProxyStat + ? ProxyStat->getLastModificationTime() + : llvm::sys::TimePoint<>(); + std::vector<std::string> *Preceding = nullptr; + if (CachedProxyPath == ProxyPath && + CachedProxyMTime == ProxyMTime) { + Preceding = &CachedPrecedingIncludes; + } else if (auto ProxyCmd = + CDB.getCompileCommand(ProxyPath)) { + if (isReliable(*ProxyCmd)) { + CachedProxyPath = std::string(ProxyPath); + CachedProxyMTime = ProxyMTime; + CachedPrecedingIncludes = scanPrecedingIncludes( + *Inputs.TFS, *ProxyCmd, FileName); + Preceding = &CachedPrecedingIncludes; + } + } + if (Preceding && !Preceding->empty()) { + std::vector<std::string> Flags; + for (const auto &Inc : *Preceding) { + Flags.push_back("-include"); + Flags.push_back(Inc); + } + auto It = llvm::find(Cmd->CommandLine, "--"); + Cmd->CommandLine.insert(It, Flags.begin(), Flags.end()); + } + } + } if (Cmd) Inputs.CompileCommand = std::move(*Cmd); else diff --git a/clang-tools-extra/clangd/unittests/TUSchedulerTests.cpp b/clang-tools-extra/clangd/unittests/TUSchedulerTests.cpp index c6862b5eba6fa..b5cab2efab611 100644 --- a/clang-tools-extra/clangd/unittests/TUSchedulerTests.cpp +++ b/clang-tools-extra/clangd/unittests/TUSchedulerTests.cpp @@ -1438,6 +1438,145 @@ TEST_F(TUSchedulerTests, IncluderCache) { << "association invalidated and then claimed by main3"; } +// When a non-header file (e.g. child.c) is #included by another non-header +// file (parent.c), the proxy mechanism should inject -include flags for +// includes that precede child.c in parent.c. Header files should not get +// preceding includes injected even when they have a proxy. +TEST_F(TUSchedulerTests, IncluderCacheNonHeaderProxy) { + static std::string Parent = testPath("parent.c"), + Child = testPath("child.c"), + Header = testPath("header.h"), + NoCmd = testPath("no_cmd.h"); + struct NonHeaderCDB : public GlobalCompilationDatabase { + std::optional<tooling::CompileCommand> + getCompileCommand(PathRef File) const override { + if (File == Child || File == NoCmd) + return std::nullopt; + auto Basic = getFallbackCommand(File); + Basic.Heuristic.clear(); + if (File == Parent) + Basic.CommandLine.push_back("-DPARENT"); + return Basic; + } + } CDB; + TUScheduler S(CDB, optsForTest()); + auto GetFlags = [&](PathRef File) { + S.update(File, getInputs(File, ";"), WantDiagnostics::Yes); + EXPECT_TRUE(S.blockUntilIdle(timeoutSeconds(60))); + Notification CmdDone; + tooling::CompileCommand Cmd; + S.runWithPreamble( + "GetFlags", File, TUScheduler::StaleOrAbsent, + [&](llvm::Expected<InputsAndPreamble> Inputs) { + ASSERT_FALSE(!Inputs) << Inputs.takeError(); + Cmd = std::move(Inputs->Command); + CmdDone.notify(); + }); + CmdDone.wait(); + EXPECT_TRUE(S.blockUntilIdle(timeoutSeconds(60))); + return Cmd.CommandLine; + }; + + FS.Files[Header] = ""; + FS.Files[Child] = ";"; + FS.Files[NoCmd] = ";"; + + // Parent includes header.h, then no_cmd.h, then child.c. + // no_cmd.h has no CDB entry, so it gets parent.c as proxy. + const char *ParentContents = R"cpp( + #include "header.h" + #include "no_cmd.h" + #include "child.c" + )cpp"; + FS.Files[Parent] = ParentContents; + S.update(Parent, getInputs(Parent, ParentContents), + WantDiagnostics::Yes); + EXPECT_TRUE(S.blockUntilIdle(timeoutSeconds(60))); + + // child.c is a non-header included by a non-header: should get + // -include flags for preceding includes. + auto ChildFlags = GetFlags(Child); + EXPECT_THAT(ChildFlags, Contains("-DPARENT")) + << "child.c should use parent.c's compile command"; + EXPECT_THAT(ChildFlags, Contains("-include")) + << "child.c should have preceding includes injected"; + + // no_cmd.h is a header included by a non-header: should get proxy + // flags but NOT preceding -include injection. + auto HdrFlags = GetFlags(NoCmd); + EXPECT_THAT(HdrFlags, Contains("-DPARENT")) + << "no_cmd.h should use parent.c's compile command"; + EXPECT_THAT(HdrFlags, Not(Contains("-include"))) + << "header files should not get preceding includes"; +} + +// Cold-start: when a non-header file is opened directly without the proxy +// (parent) ever being opened, the proxy cache is empty. The cold-start path +// should preprocess the proxy file (respecting #ifdef guards) and inject +// preceding includes. +TEST_F(TUSchedulerTests, IncluderCacheNonHeaderProxyColdStart) { + // Cold-start: child.c is opened directly without parent.c being opened + // first. The CDB returns an interpolated command for child.c inferred from + // parent.c. Preprocessing parent.c should find preceding includes and + // inject them as -include flags. + static std::string Parent = testPath("parent.c"), + Child = testPath("child.c"), + Header = testPath("header.h"), + Sibling = testPath("sibling.c"); + FS.Files[Header] = ""; + FS.Files[Sibling] = ";"; + FS.Files[Child] = ";"; + FS.Files[Parent] = R"cpp( + #include "header.h" + #include "sibling.c" + #include "child.c" + )cpp"; + + struct InterpolatingCDB : public GlobalCompilationDatabase { + std::optional<tooling::CompileCommand> + getCompileCommand(PathRef File) const override { + if (File == Parent) { + auto Basic = getFallbackCommand(File); + Basic.Heuristic.clear(); + Basic.CommandLine.push_back("-DPARENT"); + return Basic; + } + if (File == Child) { + // Simulate interpolated command inferred from parent.c + auto Basic = getFallbackCommand(File); + Basic.Heuristic = "inferred from " + Parent; + Basic.CommandLine.push_back("-DPARENT"); + return Basic; + } + return std::nullopt; + } + } CDB; + + TUScheduler S(CDB, optsForTest()); + S.update(Child, getInputs(Child, ";"), WantDiagnostics::Yes); + EXPECT_TRUE(S.blockUntilIdle(timeoutSeconds(60))); + + Notification CmdDone; + tooling::CompileCommand Cmd; + S.runWithPreamble( + "GetFlags", Child, TUScheduler::StaleOrAbsent, + [&](llvm::Expected<InputsAndPreamble> Inputs) { + ASSERT_FALSE(!Inputs) << Inputs.takeError(); + Cmd = std::move(Inputs->Command); + CmdDone.notify(); + }); + CmdDone.wait(); + EXPECT_TRUE(S.blockUntilIdle(timeoutSeconds(60))); + + EXPECT_THAT(Cmd.CommandLine, Contains("-DPARENT")) + << "child.c should use the interpolated compile command"; + EXPECT_THAT(Cmd.CommandLine, Contains("-include")) + << "child.c should have preceding includes from cold-start proxy scan"; + // header.h and sibling.c should precede child.c. + EXPECT_THAT(Cmd.CommandLine, Contains(Header)); + EXPECT_THAT(Cmd.CommandLine, Contains(Sibling)); +} + TEST_F(TUSchedulerTests, PreservesLastActiveFile) { for (bool Sync : {false, true}) { auto Opts = optsForTest(); _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
