https://github.com/dzbarsky created https://github.com/llvm/llvm-project/pull/202624
Clang currently emits one character array per builtin diagnostic description and a separate 32-bit offset for every diagnostic. Exact duplicate descriptions are therefore stored repeatedly, and the offset table alone costs about 29 KiB. Generate a StringTable containing each distinct description once and append its offset to the generated DIAG records. Store the 20-bit offset and 12-bit length inside the existing 10-byte StaticDiagInfoRec by reducing the diagnostic group index from 15 to 14 bits and repacking its two flags. TableGen and compile-time checks enforce all three field limits. Update CMake, Bazel, clangd, and diagtool for the extended DIAG signature. On arm64 macOS Release builds from the same revision, fully stripped standalone clang decreases from 94,570,224 to 94,520,688 bytes, saving 49,536 bytes (0.052380%). DiagnosticIDs.cpp.o decreases from 1,081,768 to 1,029,272 bytes, saving 52,496 bytes (4.852797%), and libclangBasic.a decreases from 6,775,200 to 6,722,704 bytes, saving 52,496 bytes (0.774826%). At this point in the downstream LLVM 22 patch stack, the stripped all-tools multicall binary decreases from 143,039,208 to 142,989,664 bytes, saving 49,544 bytes (0.0346%). An out-of-tree benchmark performing 36.495 million getDescription() lookups measured average user CPU time decreasing from 270.196 ms to 261.598 ms (3.181953%). Wall time was noisy under concurrent builds, so this is treated as evidence of no lookup regression rather than a claimed speedup. Validation: - Built standalone clang, BasicTests, clangd Diagnostics.cpp, and diagtool DiagnosticNames.cpp in Release mode. - BasicTests passed 101 tests; 3 Release-only death tests were skipped. - Compared all 7,299 generated descriptions with getDescription() exactly. - Compared representative baseline and candidate diagnostic output byte-for-byte: 806 bytes over 13 lines with identical exit status. Work towards #202616 >From a6194b8862dff0ab5684f67d37431ae03aaa3c1f Mon Sep 17 00:00:00 2001 From: David Zbarsky <[email protected]> Date: Mon, 8 Jun 2026 17:16:27 -0400 Subject: [PATCH] [clang] Deduplicate builtin diagnostic descriptions Clang currently emits one character array per builtin diagnostic description and a separate 32-bit offset for every diagnostic. Exact duplicate descriptions are therefore stored repeatedly, and the offset table alone costs about 29 KiB. Generate a StringTable containing each distinct description once and append its offset to the generated DIAG records. Store the 20-bit offset and 12-bit length inside the existing 10-byte StaticDiagInfoRec by reducing the diagnostic group index from 15 to 14 bits and repacking its two flags. TableGen and compile-time checks enforce all three field limits. Update CMake, Bazel, clangd, and diagtool for the extended DIAG signature. On arm64 macOS Release builds from the same revision, fully stripped standalone clang decreases from 94,570,224 to 94,520,688 bytes, saving 49,536 bytes (0.052380%). DiagnosticIDs.cpp.o decreases from 1,081,768 to 1,029,272 bytes, saving 52,496 bytes (4.852797%), and libclangBasic.a decreases from 6,775,200 to 6,722,704 bytes, saving 52,496 bytes (0.774826%). At this point in the downstream LLVM 22 patch stack, the stripped all-tools multicall binary decreases from 143,039,208 to 142,989,664 bytes, saving 49,544 bytes (0.0346%). An out-of-tree benchmark performing 36.495 million getDescription() lookups measured average user CPU time decreasing from 270.196 ms to 261.598 ms (3.181953%). Wall time was noisy under concurrent builds, so this is treated as evidence of no lookup regression rather than a claimed speedup. Validation: - Built standalone clang, BasicTests, clangd Diagnostics.cpp, and diagtool DiagnosticNames.cpp in Release mode. - BasicTests passed 101 tests; 3 Release-only death tests were skipped. - Compared all 7,299 generated descriptions with getDescription() exactly. - Compared representative baseline and candidate diagnostic output byte-for-byte: 806 bytes over 13 lines with identical exit status. --- clang-tools-extra/clangd/Diagnostics.cpp | 2 +- clang/include/clang/Basic/CMakeLists.txt | 5 ++ clang/lib/Basic/DiagnosticIDs.cpp | 65 ++++++------------- clang/tools/diagtool/DiagnosticNames.cpp | 2 +- .../TableGen/ClangDiagnosticsEmitter.cpp | 27 +++++++- clang/utils/TableGen/TableGen.cpp | 6 ++ clang/utils/TableGen/TableGenBackends.h | 2 + .../llvm-project-overlay/clang/BUILD.bazel | 5 ++ 8 files changed, 67 insertions(+), 47 deletions(-) diff --git a/clang-tools-extra/clangd/Diagnostics.cpp b/clang-tools-extra/clangd/Diagnostics.cpp index d0baf0224a18e..e63a4011f4e3d 100644 --- a/clang-tools-extra/clangd/Diagnostics.cpp +++ b/clang-tools-extra/clangd/Diagnostics.cpp @@ -55,7 +55,7 @@ const char *getDiagnosticCode(unsigned ID) { switch (ID) { #define DIAG(ENUM, CLASS, DEFAULT_MAPPING, DESC, GROPU, SFINAE, NOWERROR, \ SHOWINSYSHEADER, SHOWINSYSMACRO, DEFERRABLE, CATEGORY, STABLE_ID, \ - LEGACY_STABLE_IDS) \ + LEGACY_STABLE_IDS, DESCRIPTION_OFFSET) \ case clang::diag::ENUM: \ return #ENUM; #include "clang/Basic/DiagnosticASTKinds.inc" diff --git a/clang/include/clang/Basic/CMakeLists.txt b/clang/include/clang/Basic/CMakeLists.txt index 20172622ca424..78b0f1f906a0a 100644 --- a/clang/include/clang/Basic/CMakeLists.txt +++ b/clang/include/clang/Basic/CMakeLists.txt @@ -51,6 +51,11 @@ clang_tablegen(DiagnosticAllCompatIDs.inc SOURCE Diagnostic.td TARGET ClangDiagnosticAllCompatIDs) +clang_tablegen(DiagnosticDescriptions.inc + -gen-clang-diag-descriptions + SOURCE Diagnostic.td + TARGET ClangDiagnosticDescriptions) + clang_tablegen(AttrList.inc -gen-clang-attr-list -I ${CMAKE_CURRENT_SOURCE_DIR}/../../ SOURCE Attr.td diff --git a/clang/lib/Basic/DiagnosticIDs.cpp b/clang/lib/Basic/DiagnosticIDs.cpp index d9c5e4082c1a7..0dd7971501f61 100644 --- a/clang/lib/Basic/DiagnosticIDs.cpp +++ b/clang/lib/Basic/DiagnosticIDs.cpp @@ -36,44 +36,14 @@ struct StaticDiagInfoRec; #include "clang/Basic/DiagnosticStableIDs.inc" #undef GET_DIAG_STABLE_ID_ARRAYS -// Store the descriptions in a separate table to avoid pointers that need to -// be relocated, and also decrease the amount of data needed on 64-bit -// platforms. See "How To Write Shared Libraries" by Ulrich Drepper. -struct StaticDiagInfoDescriptionStringTable { -#define DIAG(ENUM, CLASS, DEFAULT_SEVERITY, DESC, GROUP, SFINAE, NOWERROR, \ - SHOWINSYSHEADER, SHOWINSYSMACRO, DEFERRABLE, CATEGORY, STABLE_ID, \ - LEGACY_STABLE_IDS) \ - char ENUM##_desc[sizeof(DESC)]; -#include "clang/Basic/AllDiagnosticKinds.inc" -#undef DIAG -}; - -const StaticDiagInfoDescriptionStringTable StaticDiagInfoDescriptions = { -#define DIAG(ENUM, CLASS, DEFAULT_SEVERITY, DESC, GROUP, SFINAE, NOWERROR, \ - SHOWINSYSHEADER, SHOWINSYSMACRO, DEFERRABLE, CATEGORY, STABLE_ID, \ - LEGACY_STABLE_IDS) \ - DESC, -#include "clang/Basic/AllDiagnosticKinds.inc" -#undef DIAG -}; +#include "clang/Basic/DiagnosticDescriptions.inc" extern const StaticDiagInfoRec StaticDiagInfo[]; -// Stored separately from StaticDiagInfoRec to pack better. Otherwise, -// StaticDiagInfoRec would have extra padding on 64-bit platforms. -const uint32_t StaticDiagInfoDescriptionOffsets[] = { -#define DIAG(ENUM, CLASS, DEFAULT_SEVERITY, DESC, GROUP, SFINAE, NOWERROR, \ - SHOWINSYSHEADER, SHOWINSYSMACRO, DEFERRABLE, CATEGORY, STABLE_ID, \ - LEGACY_STABLE_IDS) \ - offsetof(StaticDiagInfoDescriptionStringTable, ENUM##_desc), -#include "clang/Basic/AllDiagnosticKinds.inc" -#undef DIAG -}; - const uint32_t StaticDiagInfoStableIDOffsets[] = { #define DIAG(ENUM, CLASS, DEFAULT_SEVERITY, DESC, GROUP, SFINAE, NOWERROR, \ SHOWINSYSHEADER, SHOWINSYSMACRO, DEFERRABLE, CATEGORY, STABLE_ID, \ - LEGACY_STABLE_IDS) \ + LEGACY_STABLE_IDS, DESCRIPTION_OFFSET) \ STABLE_ID, #include "clang/Basic/AllDiagnosticKinds.inc" #undef DIAG @@ -82,7 +52,7 @@ const uint32_t StaticDiagInfoStableIDOffsets[] = { const uint32_t StaticDiagInfoLegacyStableIDStartOffsets[] = { #define DIAG(ENUM, CLASS, DEFAULT_SEVERITY, DESC, GROUP, SFINAE, NOWERROR, \ SHOWINSYSHEADER, SHOWINSYSMACRO, DEFERRABLE, CATEGORY, STABLE_ID, \ - LEGACY_STABLE_IDS) \ + LEGACY_STABLE_IDS, DESCRIPTION_OFFSET) \ LEGACY_STABLE_IDS, #include "clang/Basic/AllDiagnosticKinds.inc" #undef DIAG @@ -111,25 +81,27 @@ struct StaticDiagInfoRec { uint16_t WarnNoWerror : 1; LLVM_PREFERRED_TYPE(bool) uint16_t WarnShowInSystemHeader : 1; + LLVM_PREFERRED_TYPE(diag::Group) + uint16_t OptionGroupIndex : 14; LLVM_PREFERRED_TYPE(bool) uint16_t WarnShowInSystemMacro : 1; - - LLVM_PREFERRED_TYPE(diag::Group) - uint16_t OptionGroupIndex : 15; LLVM_PREFERRED_TYPE(bool) uint16_t Deferrable : 1; - uint16_t DescriptionLen; + uint16_t DescriptionOffsetLow; + uint16_t DescriptionOffsetHigh : 4; + uint16_t DescriptionLen : 12; unsigned getOptionGroupIndex() const { return OptionGroupIndex; } StringRef getDescription() const { - size_t MyIndex = this - &StaticDiagInfo[0]; - uint32_t StringOffset = StaticDiagInfoDescriptionOffsets[MyIndex]; - const char* Table = reinterpret_cast<const char*>(&StaticDiagInfoDescriptions); - return StringRef(&Table[StringOffset], DescriptionLen); + uint32_t Offset = + DescriptionOffsetLow | (uint32_t(DescriptionOffsetHigh) << 16); + return StringRef(StaticDiagInfoDescriptions.getCString( + llvm::StringTable::Offset(Offset)), + DescriptionLen); } StringRef getStableID() const { @@ -159,6 +131,9 @@ struct StaticDiagInfoRec { return DiagID < RHS.DiagID; } }; +static_assert(sizeof(StaticDiagInfoRec) == 10); +static_assert(static_cast<unsigned>(diag::Group::NUM_GROUPS) < (1U << 14), + "too many diagnostic groups for StaticDiagInfoRec"); #define STRINGIFY_NAME(NAME) #NAME #define VALIDATE_DIAG_SIZE(NAME) \ @@ -191,7 +166,7 @@ const StaticDiagInfoRec StaticDiagInfo[] = { // clang-format off #define DIAG(ENUM, CLASS, DEFAULT_SEVERITY, DESC, GROUP, SFINAE, NOWERROR, \ SHOWINSYSHEADER, SHOWINSYSMACRO, DEFERRABLE, CATEGORY, STABLE_ID, \ - LEGACY_STABLE_IDS) \ + LEGACY_STABLE_IDS, DESCRIPTION_OFFSET) \ { \ diag::ENUM, \ DEFAULT_SEVERITY, \ @@ -200,9 +175,11 @@ const StaticDiagInfoRec StaticDiagInfo[] = { CATEGORY, \ NOWERROR, \ SHOWINSYSHEADER, \ - SHOWINSYSMACRO, \ GROUP, \ - DEFERRABLE, \ + SHOWINSYSMACRO, \ + DEFERRABLE, \ + uint16_t(DESCRIPTION_OFFSET), \ + uint16_t(DESCRIPTION_OFFSET >> 16), \ STR_SIZE(DESC, uint16_t)}, #include "clang/Basic/DiagnosticCommonKinds.inc" #include "clang/Basic/DiagnosticDriverKinds.inc" diff --git a/clang/tools/diagtool/DiagnosticNames.cpp b/clang/tools/diagtool/DiagnosticNames.cpp index 1538167022aca..d6df6a297b080 100644 --- a/clang/tools/diagtool/DiagnosticNames.cpp +++ b/clang/tools/diagtool/DiagnosticNames.cpp @@ -29,7 +29,7 @@ llvm::ArrayRef<DiagnosticRecord> diagtool::getBuiltinDiagnosticsByName() { static const DiagnosticRecord BuiltinDiagnosticsByID[] = { #define DIAG(ENUM, CLASS, DEFAULT_MAPPING, DESC, GROUP, SFINAE, NOWERROR, \ SHOWINSYSHEADER, SHOWINSYSMACRO, DEFER, CATEGORY, STABLE_ID, \ - LEGACY_STABLE_IDS) \ + LEGACY_STABLE_IDS, DESCRIPTION_OFFSET) \ {#ENUM, diag::ENUM, STR_SIZE(#ENUM, uint8_t)}, #include "clang/Basic/AllDiagnosticKinds.inc" #undef DIAG diff --git a/clang/utils/TableGen/ClangDiagnosticsEmitter.cpp b/clang/utils/TableGen/ClangDiagnosticsEmitter.cpp index ba10be060c20a..327fd05bffff9 100644 --- a/clang/utils/TableGen/ClangDiagnosticsEmitter.cpp +++ b/clang/utils/TableGen/ClangDiagnosticsEmitter.cpp @@ -1586,7 +1586,7 @@ namespace diag { enum { #define DIAG(ENUM, FLAGS, DEFAULT_MAPPING, DESC, GROUP, SFINAE, NOWERROR, \ SHOWINSYSHEADER, SHOWINSYSMACRO, DEFERRABLE, CATEGORY, STABLE_ID, \ - LEGACY_STABLE_IDS) \ + LEGACY_STABLE_IDS, DESCRIPTION_OFFSET) \ ENUM, #define %sSTART #include "clang/Basic/Diagnostic%sKinds.inc" @@ -1805,6 +1805,29 @@ void clang::EmitClangDiagsStableIDs(const RecordKeeper &Records, StableIDs.emit(OS); } +void clang::EmitClangDiagDescriptions(const RecordKeeper &Records, + raw_ostream &OS) { + DiagnosticTextBuilder DiagTextBuilder(Records); + StringToOffsetTable Descriptions; + + OS << "enum {\n"; + for (const Record &R : + make_pointee_range(Records.getAllDerivedDefinitions("Diagnostic"))) { + std::string Description = DiagTextBuilder.buildForDefinition(&R); + if (Description.size() >= (1U << 12)) + PrintFatalError(R.getLoc(), + "diagnostic description exceeds 12-bit length limit"); + + unsigned Offset = Descriptions.GetOrAddStringOffset(Description); + OS << " DIAG_DESC_OFFSET_" << R.getName() << " = " << Offset << ",\n"; + } + OS << "};\n\n"; + + if (Descriptions.size() >= (1U << 20)) + PrintFatalError("diagnostic descriptions exceed 20-bit offset limit"); + Descriptions.EmitStringTableDef(OS, "StaticDiagInfoDescriptions"); +} + /// ClangDiagsDefsEmitter - The top-level class emits .def files containing /// declarations of Clang diagnostics. void clang::EmitClangDiagsDefs(const RecordKeeper &Records, raw_ostream &OS, @@ -1925,6 +1948,8 @@ void clang::EmitClangDiagsDefs(const RecordKeeper &Records, raw_ostream &OS, StableIDs.getLegacyStableIDsStartOffset(R.getName()); OS << ", " << LegacyStableIDsStartOffset; + OS << ", DIAG_DESC_OFFSET_" << R.getName(); + OS << ")\n"; } } diff --git a/clang/utils/TableGen/TableGen.cpp b/clang/utils/TableGen/TableGen.cpp index 0ce9d8306ae16..685c6f41c1be8 100644 --- a/clang/utils/TableGen/TableGen.cpp +++ b/clang/utils/TableGen/TableGen.cpp @@ -51,6 +51,7 @@ enum ActionType { GenClangBuiltins, GenClangBuiltinTemplates, GenClangDiagsCompatIDs, + GenClangDiagDescriptions, GenClangDiagsDefs, GenClangDiagsEnums, GenClangDiagGroups, @@ -197,6 +198,8 @@ cl::opt<ActionType> Action( "Generate clang builtins list"), clEnumValN(GenClangDiagsCompatIDs, "gen-clang-diags-compat-ids", "Generate Clang diagnostic compatibility ids"), + clEnumValN(GenClangDiagDescriptions, "gen-clang-diag-descriptions", + "Generate Clang diagnostic descriptions"), clEnumValN(GenClangDiagsDefs, "gen-clang-diags-defs", "Generate Clang diagnostics definitions"), clEnumValN(GenClangDiagsEnums, "gen-clang-diags-enums", @@ -459,6 +462,9 @@ bool ClangTableGenMain(raw_ostream &OS, const RecordKeeper &Records) { case GenClangDiagsCompatIDs: EmitClangDiagsCompatIDs(Records, OS, ClangComponent); break; + case GenClangDiagDescriptions: + EmitClangDiagDescriptions(Records, OS); + break; case GenClangDiagsDefs: EmitClangDiagsDefs(Records, OS, ClangComponent); break; diff --git a/clang/utils/TableGen/TableGenBackends.h b/clang/utils/TableGen/TableGenBackends.h index f9bd7ccf9d0d8..1dc98a35b0c3c 100644 --- a/clang/utils/TableGen/TableGenBackends.h +++ b/clang/utils/TableGen/TableGenBackends.h @@ -101,6 +101,8 @@ void EmitClangBuiltinTemplates(const llvm::RecordKeeper &Records, void EmitClangDiagsCompatIDs(const llvm::RecordKeeper &Records, llvm::raw_ostream &OS, const std::string &Component); +void EmitClangDiagDescriptions(const llvm::RecordKeeper &Records, + llvm::raw_ostream &OS); void EmitClangDiagsDefs(const llvm::RecordKeeper &Records, llvm::raw_ostream &OS, const std::string &Component); void EmitClangDiagsEnums(const llvm::RecordKeeper &Records, diff --git a/utils/bazel/llvm-project-overlay/clang/BUILD.bazel b/utils/bazel/llvm-project-overlay/clang/BUILD.bazel index a67fed9c9953d..767859ad0b28d 100644 --- a/utils/bazel/llvm-project-overlay/clang/BUILD.bazel +++ b/utils/bazel/llvm-project-overlay/clang/BUILD.bazel @@ -107,6 +107,10 @@ gentbl_cc_library( "include/clang/Basic/DiagnosticStableIDs.inc", ["-gen-clang-diags-stable-ids"], ), + ( + "include/clang/Basic/DiagnosticDescriptions.inc", + ["-gen-clang-diag-descriptions"], + ), ]), tblgen = ":clang-tblgen", td_file = "include/clang/Basic/Diagnostic.td", @@ -708,6 +712,7 @@ cc_library( "include/clang/Basic/DiagnosticCommentKinds.inc", "include/clang/Basic/DiagnosticCommonKinds.inc", "include/clang/Basic/DiagnosticCrossTUKinds.inc", + "include/clang/Basic/DiagnosticDescriptions.inc", "include/clang/Basic/DiagnosticDriverKinds.inc", "include/clang/Basic/DiagnosticFrontendKinds.inc", "include/clang/Basic/DiagnosticGroups.inc", _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
