llvmorg-github-actions[bot] wrote:
<!--LLVM PR SUMMARY COMMENT--> @llvm/pr-subscribers-clang-codegen Author: Tony Varghese (tonykuttai) <details> <summary>Changes</summary> This patch introduces a new Clang command-line option, `-mloadtime-comment-vars=`, which accepts a comma-separated list of variable names to preserve as loadtime identifying strings in the final binary object file. It ensures that these specific string variables (such as strings that embed 'sccsid' or 'version' info in source variables) are preserved in the object file and not stripped during aggressive garbage collection. This complements the `#pragma comment(copyright, ...)` feature by supporting codebases that use this older pattern. This is a stacked pr on top of [[Analysis][AIX] Add !implicit.ref globals as ThinLTO summary ref edges to support pragma comment(copyright) LTO interaction](https://github.com/llvm/llvm-project/pull/199358#top) which in turn depends on [[PowerPC][AIX] Support #pragma comment copyright for AIX](https://github.com/llvm/llvm-project/pull/178184#top). --- Patch is 26.50 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/187986.diff 9 Files Affected: - (modified) clang/docs/LanguageExtensions.rst (+66) - (modified) clang/include/clang/Basic/CodeGenOptions.h (+3) - (modified) clang/include/clang/Options/Options.td (+7) - (modified) clang/lib/CodeGen/CodeGenModule.cpp (+77) - (modified) clang/lib/CodeGen/CodeGenModule.h (+8) - (modified) clang/lib/Driver/ToolChains/Clang.cpp (+5) - (added) clang/test/CodeGen/loadtime-comment-vars.c (+37) - (modified) llvm/lib/Transforms/Utils/LowerCommentStringPass.cpp (+184-87) - (added) llvm/test/Transforms/LowerCommentString/loadtime-comment-vars.ll (+34) ``````````diff diff --git a/clang/docs/LanguageExtensions.rst b/clang/docs/LanguageExtensions.rst index 992c6f2927be8..e6da6bd88108b 100644 --- a/clang/docs/LanguageExtensions.rst +++ b/clang/docs/LanguageExtensions.rst @@ -6844,6 +6844,72 @@ When ``#pragma comment(copyright, ...)`` appears in a C++20 module interface unit, the copyright string is embedded only in the object file compiled from that interface unit. Importing TUs do not re-emit the string. +Preserving Identifying Variables with -mloadtime-comment-vars +-------------------------------------------------------------- + +The ``-mloadtime-comment-vars=`` flag accepts a comma-separated list of +global variable names that should be preserved in the final object file as +loadtime identifying strings. This is an AIX-specific feature and is silently +ignored on other targets. + +This flag complements ``#pragma comment(copyright, ...)`` for codebases that +already use the traditional UNIX convention of embedding identifying strings +directly in source variables rather than via a pragma. + +**Syntax** + +.. code-block:: console + + -mloadtime-comment-vars=<var1>[,<var2>,...] + +**Valid variable types** + +A variable named in the list must meet both of these conditions to be +preserved: + +- Its type must be a character pointer (``char *``, ``const char *``) or a + character array (``char[]``). +- It must have an initializer. + +Variables that fail either check -- for example, an ``int`` or a ``struct`` -- +are silently skipped. Variables that appear in the list but are not defined in +the translation unit are also ignored. + +**Example** + +.. code-block:: c + + static char *sccsid = "@(#) MyApp Version 1.0"; + static char version[] = "@(#) Built 2026-05-24"; + + void foo() {} + +Compiled with: + +.. code-block:: console + + clang -target powerpc64-ibm-aix \ + -mloadtime-comment-vars=sccsid,version \ + -c source.c -o source.o + +Both ``sccsid`` and ``version`` survive optimization and garbage collection and +are visible in the object file: + +.. code-block:: console + + $ what source.o + source.o: + MyApp Version 1.0 + Built 2026-05-24 + +**Interaction with** ``#pragma comment(copyright, ...)`` + +The two mechanisms can be used together in the same translation unit. The +pragma produces a dedicated ``__loadtime_comment_str`` symbol placed in the +``__loadtime_comment`` section, while ``-mloadtime-comment-vars`` preserves +the named source variables in place using ``.ref`` directives. Both sets of +strings appear in the final object file independently. + Evaluating Object Size ====================== diff --git a/clang/include/clang/Basic/CodeGenOptions.h b/clang/include/clang/Basic/CodeGenOptions.h index e43112b4bb98b..54b2fd2077d7b 100644 --- a/clang/include/clang/Basic/CodeGenOptions.h +++ b/clang/include/clang/Basic/CodeGenOptions.h @@ -334,6 +334,9 @@ class CodeGenOptions : public CodeGenOptionsBase { /// A list of linker options to embed in the object file. std::vector<std::string> LinkerOptions; + /// List of global variable names to preserve as loadtime comment variables. + std::vector<std::string> LoadTimeCommentVars; + /// Name of the profile file to use as output for -fprofile-instr-generate, /// -fprofile-generate, and -fcs-profile-generate. std::string InstrProfileOutput; diff --git a/clang/include/clang/Options/Options.td b/clang/include/clang/Options/Options.td index 5dab4af7618fc..dea04b41b51f9 100644 --- a/clang/include/clang/Options/Options.td +++ b/clang/include/clang/Options/Options.td @@ -4765,6 +4765,13 @@ def fvisibility_global_new_delete_EQ : Joined<["-"], "fvisibility-global-new-del Visibility<[ClangOption, CC1Option]>, HelpText<"The visibility for global C++ operator new and delete declarations. If 'source' is specified the visibility is not adjusted">, MarshallingInfoVisibilityGlobalNewDelete<LangOpts<"GlobalAllocationFunctionVisibility">, "ForceDefault">; +def mloadtime_comment_vars_EQ + : CommaJoined<["-"], "mloadtime-comment-vars=">, + Group<m_Group>, + Visibility<[ClangOption, CC1Option]>, + HelpText<"Comma-separated list of global variable names to treat as " + "loadtime variables">, + MarshallingInfoStringVector<CodeGenOpts<"LoadTimeCommentVars">>; def mdefault_visibility_export_mapping_EQ : Joined<["-"], "mdefault-visibility-export-mapping=">, Values<"none,explicit,all">, NormalizedValuesScope<"LangOptions::DefaultVisiblityExportMapping">, diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp index 84fcec42adee2..0eb7d8eb986b2 100644 --- a/clang/lib/CodeGen/CodeGenModule.cpp +++ b/clang/lib/CodeGen/CodeGenModule.cpp @@ -79,6 +79,7 @@ #include "llvm/Transforms/Instrumentation/KCFI.h" #include "llvm/Transforms/Utils/BuildLibCalls.h" #include "llvm/Transforms/Utils/KCFIHash.h" +#include "llvm/Transforms/Utils/ModuleUtils.h" #include <optional> #include <set> @@ -1744,6 +1745,9 @@ void CodeGenModule::Release() { EmitLoadTimeComment(); + // Emit loadtime comment variables specified via -mloadtime-comment-vars. + EmitLoadTimeCommentVars(); + // If there is device offloading code embed it in the host now. EmbedObject(&getModule(), CodeGenOpts, *getFileSystem(), getDiags()); @@ -4243,6 +4247,79 @@ void CodeGenModule::EmitLoadTimeComment() { } } +/// Check if a variable declaration is suitable to be treated as a loadtime +/// comment variable. Valid variables must be character pointers or character +/// arrays with an initializer. +bool CodeGenModule::isValidLoadTimeCommentVariable(const VarDecl *D) const { + // Must be a valid declaration and must have an initializer (the string). + if (!D || !D->hasInit()) + return false; + + QualType Ty = D->getType(); + + // 1. Handle Pointers (e.g., char *sccsid, const char *copyright). + if (const PointerType *PT = Ty->getAs<PointerType>()) { + if (PT->getPointeeType()->isAnyCharacterType()) + return true; + } + + // 2. Handle Arrays (e.g., char version[]) + // use ASTContext::getAsArrayType to safely unwrap constant arrays. + if (const ArrayType *AT = getContext().getAsArrayType(Ty)) { + if (AT->getElementType()->isAnyCharacterType()) + return true; + } + + return false; // Reject ints, structs, etc. +} + +/// Emit global variables specified via -mloadtime-comment-vars as loadtime +/// comment variables. These variables are tagged with metadata and marked as +/// used to prevent garbage collection. +void CodeGenModule::EmitLoadTimeCommentVars() { + if (!getTriple().isOSAIX()) + return; + + const auto &LoadTimeCommentVars = getCodeGenOpts().LoadTimeCommentVars; + if (LoadTimeCommentVars.empty()) + return; + + TranslationUnitDecl *TU = getContext().getTranslationUnitDecl(); + for (auto *D : TU->decls()) { + VarDecl *VD = dyn_cast<VarDecl>(D); + if (!VD) + continue; + + // Check if the variable name is in the loadtime comment vars list. + if (!llvm::is_contained(LoadTimeCommentVars, VD->getName())) + continue; + + if (!isValidLoadTimeCommentVariable(VD)) + continue; + + llvm::Constant *Addr = GetAddrOfGlobalVar(VD); + + auto *GV = dyn_cast<llvm::GlobalVariable>(Addr->stripPointerCasts()); + if (!GV) + continue; + + // Force Clang to emit the definition if it skipped it. + if (GV->isDeclaration()) + EmitGlobalDefinition(VD); + + if (GV->isDeclaration()) + continue; + + // Record the variable name in named module metadata. + llvm::NamedMDNode *MD = + getModule().getOrInsertNamedMetadata("loadtime_comment.vars"); + llvm::Metadata *Ops[] = { + llvm::MDString::get(getLLVMContext(), VD->getName())}; + MD->addOperand(llvm::MDNode::get(getLLVMContext(), Ops)); + llvm::appendToCompilerUsed(getModule(), {GV}); + } +} + bool CodeGenModule::MayBeEmittedEagerly(const ValueDecl *Global) { // In OpenMP 5.0 variables and function may be marked as // device_type(host/nohost) and we should not emit them eagerly unless we sure diff --git a/clang/lib/CodeGen/CodeGenModule.h b/clang/lib/CodeGen/CodeGenModule.h index 2106ad26d54f0..eab2fd67f0e5b 100644 --- a/clang/lib/CodeGen/CodeGenModule.h +++ b/clang/lib/CodeGen/CodeGenModule.h @@ -2175,6 +2175,14 @@ class CodeGenModule : public CodeGenTypeCache { /// Emit the load-time comment metadata (e.g., from /// #pragma comment(copyright, ...)) for the translation unit. void EmitLoadTimeComment(); + + /// Check if a variable declaration is suitable to be treated as a loadtime + /// comment variable (must be a character pointer or array with initializer). + bool isValidLoadTimeCommentVariable(const VarDecl *D) const; + + /// Emit global variables specified via -mloadtime-comment-vars as loadtime + /// comment variables, tagging them with metadata and preventing removal. + void EmitLoadTimeCommentVars(); }; } // end namespace CodeGen diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp index df49877d4bf62..e7038703744ed 100644 --- a/clang/lib/Driver/ToolChains/Clang.cpp +++ b/clang/lib/Driver/ToolChains/Clang.cpp @@ -6203,6 +6203,11 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA, else if (UnwindTables) CmdArgs.push_back("-funwind-tables=1"); + // Forward loadtime-comment vars option to cc1. + if (Arg *A = Args.getLastArg(options::OPT_mloadtime_comment_vars_EQ)) { + A->render(Args, CmdArgs); + } + // Sframe unwind tables are independent of the other types. Although also // defined for aarch64, only x86_64 support is implemented at the moment. if (Arg *A = Args.getLastArg(options::OPT_gsframe)) { diff --git a/clang/test/CodeGen/loadtime-comment-vars.c b/clang/test/CodeGen/loadtime-comment-vars.c new file mode 100644 index 0000000000000..99c7fd7cc50d4 --- /dev/null +++ b/clang/test/CodeGen/loadtime-comment-vars.c @@ -0,0 +1,37 @@ +// RUN: %clang_cc1 -O2 -triple powerpc-ibm-aix -mloadtime-comment-vars=sccsid,version,build_number -emit-llvm -disable-llvm-passes -o - %s | FileCheck %s +// RUN: %clang_cc1 -O2 -triple powerpc64-ibm-aix -mloadtime-comment-vars=sccsid,version,build_number -emit-llvm -disable-llvm-passes -o - %s | FileCheck %s + +// String pointer +static char *sccsid = "@(#) sccsid Version 1.0"; + +// String array +static char version[] = "@(#) Copyright Version 2.0"; + +// Const string (Not in CLI list, should NOT be emitted) +static const char *copyright = "@(#) Copyright 2026"; + +// Integer (In CLI list but invalid type, should NOT be emitted) +static int build_number = 12345; + +// Struct (not in CLI list and invalid type, NOT emitted) +struct build_info { + int major; + int minor; +} static build_data = {1, 0}; + +void foo() {} + +// CHECK: @sccsid = internal global ptr @.str, align {{[0-9]+}} +// CHECK: @.str = private unnamed_addr constant [24 x i8] c"@(#) sccsid Version 1.0\00", align {{[0-9]+}} +// CHECK: @version = internal global [27 x i8] c"@(#) Copyright Version 2.0\00", align {{[0-9]+}} +// CHECK: @llvm.compiler.used = appending global [2 x ptr] [ptr @sccsid, ptr @version], section "llvm.metadata" + +// Ensure unrequested/invalid variables are not emitted +// CHECK-NOT: @copyright +// CHECK-NOT: @build_number +// CHECK-NOT: @build_data + +// Verify named metadata contains the preserved variable names +// CHECK: !loadtime_comment.vars = !{![[MD_SCC:[0-9]+]], ![[MD_VER:[0-9]+]]} +// CHECK: ![[MD_SCC]] = !{!"sccsid"} +// CHECK: ![[MD_VER]] = !{!"version"} \ No newline at end of file diff --git a/llvm/lib/Transforms/Utils/LowerCommentStringPass.cpp b/llvm/lib/Transforms/Utils/LowerCommentStringPass.cpp index 501c7949bfc3d..70022f6ac57c1 100644 --- a/llvm/lib/Transforms/Utils/LowerCommentStringPass.cpp +++ b/llvm/lib/Transforms/Utils/LowerCommentStringPass.cpp @@ -4,41 +4,86 @@ // See https://llvm.org/LICENSE.txt for license information. // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception // -//===---------------------------------------------------------------------===// +//===----------------------------------------------------------------------===// +// +// This pass processes copyright and variable metadata for AIX, handling two +// distinct mechanisms: +// +// 1. #pragma comment(copyright, "...") - TU-wide copyright strings +// 2. -mloadtime-comment-vars=<names> - User-specified global variables +// +// Both types of information must be preserved in the final object file and +// survive optimization passes including DCE and LTO. // -// This pass lowers the module-level comment string metadata emitted by Clang: +// === #pragma comment(copyright, "...") === +// +// Clang emits module-level metadata for copyright pragmas: // // !comment_string.loadtime = !{!"Copyright ..."} // -// into concrete, translation-unit-weak-hidden globals. -// This Pass is enabled only for AIX. -// For each module (translation unit), the pass performs the following: +// This pass materializes the metadata into a concrete TU-weak hidden global +// variable: +// +// 1. Creates a null-terminated, weak_odr constant string global +// `__loadtime_comment_str_HASH` containing the copyright text with section +// attribute "__loadtime_comment". The backend emits this to a special +// section in the object file. +// +// 2. Marks the global in `llvm.compiler.used` to prevent removal by +// optimization passes. +// +// 3. Attaches `!implicit.ref` metadata to every defined function, +// referencing the global. The PowerPC AIX backend emits a `.ref` +// directive for each reference, creating relocations that prevent the +// linker from discarding the string. +// +// === -mloadtime-comment-vars=<names> === +// +// Clang stores the names of user-specified global variables (e.g., char +// *sccsid, char version[]) in module-level metadata: +// +// !loadtime_comment.vars = !{!{!"sccsid"}, !{!"version"}} +// +// This pass: +// +// 1. Reads the variable names from the metadata and looks up each global +// by name using M.getNamedGlobal(). // -// 1. Creates a null-terminated, weak_odr hidden constant string global -// (`__loadtime_comment_str`) containing the copyright text with -// section attribute "__loadtime_comment". The backend places this -// in the .text section of the object file. +// 2. Attaches `!implicit.ref` metadata to every defined function, +// referencing each tagged global. This ensures the variables survive +// optimization and linking. // -// 2. Marks the string in `llvm.compiler.used` so it cannot be dropped by -// optimization or LTO. +// === Output Example === // -// 3. Attaches `!implicit.ref` metadata referencing the string to every -// defined function in the module. The PowerPC AIX backend recognizes -// this metadata and emits a `.ref` directive from the function to the -// string, creating a concrete relocation that prevents the linker from -// discarding the string (as long as the referencing symbol is kept). +// Input IR: // -// Input IR: -// !comment_string.loadtime = !{!"Copyright"} -// Output IR: -// @__loadtime_comment_str_HASH = weak_odr constant [N x i8] -// c"Copyright\00", -// section "__loadtime_comment" -// @llvm.compiler.used = appending global [1 x ptr] [ptr -// @__loadtime_comment_str_HASH] +// @sccsid = internal global ptr @.str, align 8 +// @.str = private unnamed_addr constant [24 x i8] c"@(#) sccsid +// Version 1.0\00", align 1 +// @llvm.compiler.used = appending global [1 x ptr] [ptr @sccsid], section +// "llvm.metadata" +// !comment_string.loadtime = !{!1} +// !loadtime_comment.vars = !{!2} +// !1 = !{!"Pragma comment copyright"} +// !2 = !{!"sccsid"} // -// define i32 @func() !implicit.ref !5 { ... } -// !5 = !{ptr @__loadtime_comment_str_HASH} +// Output IR: +// @sccsid = internal global ptr @.str, align 8 +// @.str = private unnamed_addr constant [24 x i8] c"@(#) sccsid +// Version 1.0\00", align 1 +// @__loadtime_comment_str_HASH = weak_odr unnamed_addr constant [25 x i8] +// c"Pragma comment copyright\00", section "__loadtime_comment", align 1, +// !guid !0 +// @llvm.compiler.used = appending global [2 x ptr] [ptr @sccsid, ptr +// @__loadtime_comment_str_HASH], section "llvm.metadata" +// +// define void @foo() !implicit.ref !1 !implicit.ref !2 { +// entry: +// ret void +// } +// +// !1 = !{ptr @__loadtime_comment_str_HASH} +// !2 = !{ptr @sccsid} // //===----------------------------------------------------------------------===// @@ -86,76 +131,128 @@ PreservedAnalyses LowerCommentStringPass::run(Module &M, LLVMContext &Ctx = M.getContext(); - // Single-metadata: !comment_string.loadtime = !{!0} - // Each operand node is expected to have one MDString operand. + // This pass processes two types of copyright/identifying information: + // 1. A single TU-wide copyright string from #pragma comment(copyright, "...") + // 2. Multiple user-specified variables from -mloadtime-comment-vars=... + // + // Both need implicit references from every function to survive DCE and LTO. + // Collect all copyright globals, then create implicit references + // from every function definition to each global. This forces the backend + // to treat them as reachable and preserve them in the final object file. + SmallVector<GlobalValue *, 4> CopyrightGlobals; + + // ========================================================================= + // Process #pragma comment(copyright, "...") - at most one per TU + // ========================================================================= + // Frontend emits module-level metadata: + // !comment_string.loadtime = !{!0} + // !0 = !{!"Copyright text here"} + // + // We materialize this as a global string in the __loadtime_comment section, + // which linkers recognize and include in the object file's loadtime + // comment area. NamedMDNode *MD = M.getNamedMetadata("comment_string.loadtime"); - if (!MD || MD->getNumOperands() == 0) - return PreservedAnalyses::all(); + if (MD && MD->getNumOperands() > 0) { + MDNode *MdNode = MD->getOperand(0); + if (MdNode && MdNode->getNumOperands() > 0) { + auto *MdString = dyn_cast_or_null<MDString>(MdNode->getOperand(0)); + if (MdString && !MdString->getString().empty()) { + StringRef Text = MdString->getString(); + + uint64_t Hash = xxh3_64bits(Text); + std::string GlobalName = + ("__loadtime_comment_str_" + Twine::utohexstr(Hash)).str(); - // At this point we are guaranteed that one TU contains a single copyright - // metadata entry. Create TU-local string global for that metadata entry. - MDNode *MdNode = MD->getOperand(0); - if (!MdNode || MdNode->getNumOperands() == 0) - return PreservedAnalyses::all(); + // Create a null-terminated string constant in the special section. + Constant *StrInit = + ConstantDataArray::getString(Ctx, Text, /*AddNull=*/ true); + // The global variable should be weak_odr, constant, and hidden. + auto *StrGV = new GlobalVariable(M, StrInit->getType(), + /*isConstant=*/true, + GlobalValue::WeakODRLinkage, StrInit, + GlobalName); + StrGV->setVisibility(GlobalValue::HiddenVisibility); + StrGV->setUnnamedAddr(GlobalValue::UnnamedAddr::Global); + StrGV->setAlignment(Align(1)); + // Backend recognizes this section and emits it to .loadtime_comment. + StrGV->setSection("__loadtime_comment"); + // Assign a stable GUID to the global string created. + uint64_t GUID = llvm::MD5Hash(GlobalName); + StrGV->setMetadata("guid", + MDNode::get(Ctx, {ConstantAsMetadata::get(ConstantInt::get( + Type::getInt64Ty(Ctx), GUID))})); + // Prevent removal by optimizer passes (but not sufficient for linker). + appendToCompilerUsed(M, {StrGV}); + // Add to list - will get implicit refs from all functions below. + CopyrightGlobals.push_back(StrGV); + } + } + // Clean up the metadata as we have consumed it. + MD->eraseFromParent(); + } - auto *MdString = dyn_cast_or_null<MDString>(MdNode->getOperand(0)); - ... [truncated] `````````` </details> https://github.com/llvm/llvm-project/pull/187986 _______________________________________________ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
