=?utf-8?q?Björn?= Svensson <[email protected]>,
=?utf-8?q?Björn?= Svensson <[email protected]>,
=?utf-8?q?Björn?= Svensson <[email protected]>,
=?utf-8?q?Björn?= Svensson <[email protected]>,
=?utf-8?q?Björn?= Svensson <[email protected]>,
=?utf-8?q?Björn?= Svensson <[email protected]>
Message-ID:
In-Reply-To: <llvm.org/llvm/llvm-project/pull/[email protected]>


llvmbot wrote:


<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang-driver

Author: Björn Svensson (bjosv)

<details>
<summary>Changes</summary>

This adds `-fsanitize-prefix-map=OLD=NEW` to remap file source paths in 
sanitizer metadata (ASan and UBSan), enabling reproducible builds.

AddressSanitizer and UndefinedBehaviorSanitizer embed source file paths in 
binary metadata (`.rodata`). This causes non-reproducible builds when compiling 
from different directories. See discussion: 
https://discourse.llvm.org/t/making-reproducible-builds-using-asan-and-ubsan/89115

While `-fsanitize-undefined-strip-path-components` exists for UBSan, there was 
no equivalent for ASan, and neither supported path remapping (only stripping).

### Changes

- Add `-fsanitize-prefix-map=OLD=NEW` flag
- Apply prefix map to ASan module name in `AddressSanitizer.cpp`
- Apply prefix map to UBSan source locations in `CGExpr.cpp`
- Make `-ffile-prefix-map` imply `-fsanitize-prefix-map` (consistent with 
debug/macro/coverage prefix maps)

### Usage

```bash
# Remap paths for sanitizers only
clang -fsanitize=address -fsanitize-prefix-map=/build/dir/= source.c

# Or use -ffile-prefix-map (applies to debug info, macros, coverage, and 
sanitizers)
clang -fsanitize=address -ffile-prefix-map=/build/dir/= source.c
```

Alternatively a flag like `-fsanitize-address-strip-path-components=N` can be 
added instead (`-fsanitize-undefined-strip-path-components` exists already), 
but this seemed like a better solution. 

---
Full diff: https://github.com/llvm/llvm-project/pull/186908.diff


15 Files Affected:

- (modified) clang/docs/AddressSanitizer.rst (+22) 
- (modified) clang/docs/ReleaseNotes.rst (+4) 
- (modified) clang/docs/UndefinedBehaviorSanitizer.rst (+5) 
- (modified) clang/docs/UsersManual.rst (+2) 
- (modified) clang/include/clang/Basic/CodeGenOptions.h (+3) 
- (modified) clang/include/clang/Options/Options.td (+5) 
- (modified) clang/lib/CodeGen/BackendUtil.cpp (+2) 
- (modified) clang/lib/CodeGen/CGExpr.cpp (+10) 
- (modified) clang/lib/Driver/ToolChains/Clang.cpp (+16) 
- (modified) clang/lib/Frontend/CompilerInvocation.cpp (+9) 
- (added) clang/test/CodeGen/asan-prefix-map.cpp (+11) 
- (added) clang/test/CodeGen/ubsan-prefix-map.cpp (+9) 
- (added) clang/test/Driver/fsanitize-prefix-map.cpp (+8) 
- (modified) llvm/include/llvm/Transforms/Instrumentation/AddressSanitizer.h 
(+1) 
- (modified) llvm/lib/Transforms/Instrumentation/AddressSanitizer.cpp (+25-10) 


``````````diff
diff --git a/clang/docs/AddressSanitizer.rst b/clang/docs/AddressSanitizer.rst
index c11d470d1defa..9a3ca9e53dc4f 100644
--- a/clang/docs/AddressSanitizer.rst
+++ b/clang/docs/AddressSanitizer.rst
@@ -405,6 +405,28 @@ run-time performance, which leads to increased binary 
size. Using the
 flag forces all code instrumentation to be outlined, which reduces the size
 of the generated code, but also reduces the run-time performance.
 
+Remapping source paths
+----------------------
+
+AddressSanitizer embeds the source file path in global metadata. For
+reproducible builds, the option ``-fsanitize-prefix-map=OLD=NEW`` can be used
+to remap these paths. If a source path starts with ``OLD``, it will be replaced
+with ``NEW``.
+
+Example
+^^^^^^^
+
+.. code-block:: console
+
+  # Strip build directory prefix
+  $ clang -fsanitize=address -fsanitize-prefix-map=/build/dir/= source.c
+
+  # Remap to a canonical path
+  $ clang -fsanitize=address -fsanitize-prefix-map=/home/user/project=/src 
source.c
+
+Multiple ``-fsanitize-prefix-map`` options can be specified; the first matching
+prefix wins.
+
 Limitations
 ===========
 
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 557e231a938d9..19002b58b23cc 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -393,6 +393,10 @@ Static Analyzer
 Sanitizers
 ----------
 
+- Added ``-fsanitize-prefix-map=OLD=NEW`` option to remap source file paths
+  in sanitizer metadata, enabling reproducible builds. This flag is also
+  implied by ``-ffile-prefix-map``.
+
 Python Binding Changes
 ----------------------
 - Add deprecation warnings to ``CompletionChunk.isKind...`` methods.
diff --git a/clang/docs/UndefinedBehaviorSanitizer.rst 
b/clang/docs/UndefinedBehaviorSanitizer.rst
index 60c619daecfc3..218a3855d1ed4 100644
--- a/clang/docs/UndefinedBehaviorSanitizer.rst
+++ b/clang/docs/UndefinedBehaviorSanitizer.rst
@@ -476,6 +476,10 @@ information. If ``N`` is positive, file information 
emitted by
 UndefinedBehaviorSanitizer will drop the first ``N`` components from the file
 path. If ``N`` is negative, the last ``N`` components will be kept.
 
+Alternatively, ``-fsanitize-prefix-map=OLD=NEW`` can be used to remap file
+paths. If a source path starts with ``OLD``, it will be replaced with ``NEW``.
+Both options can be combined; the prefix map is applied first.
+
 Example
 -------
 
@@ -486,6 +490,7 @@ For a file called ``/code/library/file.cpp``, here is what 
would be emitted:
 * ``-fsanitize-undefined-strip-path-components=2``: ``library/file.cpp``
 * ``-fsanitize-undefined-strip-path-components=-1``: ``file.cpp``
 * ``-fsanitize-undefined-strip-path-components=-2``: ``library/file.cpp``
+* ``-fsanitize-prefix-map=/code/=``: ``library/file.cpp``
 
 More Information
 ================
diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst
index cec1e2a6a4677..8bd03103100e4 100644
--- a/clang/docs/UsersManual.rst
+++ b/clang/docs/UsersManual.rst
@@ -5273,6 +5273,8 @@ Execute ``clang-cl /?`` to see a list of supported 
options:
                               Enable origins tracking in MemorySanitizer
       -fsanitize-memory-use-after-dtor
                               Enable use-after-destroy detection in 
MemorySanitizer
+      -fsanitize-prefix-map=<old>=<new>
+                              Remap file source paths in sanitizer metadata.
       -fsanitize-recover=<value>
                               Enable recovery for specified sanitizers
       -fsanitize-stats        Enable sanitizer statistics gathering.
diff --git a/clang/include/clang/Basic/CodeGenOptions.h 
b/clang/include/clang/Basic/CodeGenOptions.h
index 8ef0d87faaeaf..73c475dd26c20 100644
--- a/clang/include/clang/Basic/CodeGenOptions.h
+++ b/clang/include/clang/Basic/CodeGenOptions.h
@@ -261,6 +261,9 @@ class CodeGenOptions : public CodeGenOptionsBase {
   /// file paths in coverage mapping.
   llvm::SmallVector<std::pair<std::string, std::string>, 0> CoveragePrefixMap;
 
+  /// Prefix replacement map for sanitizers to remap source file paths.
+  llvm::SmallVector<std::pair<std::string, std::string>, 0> SanitizePrefixMap;
+
   /// The ABI to use for passing floating point arguments.
   std::string FloatABI;
 
diff --git a/clang/include/clang/Options/Options.td 
b/clang/include/clang/Options/Options.td
index a274017953b1d..a47219784ad70 100644
--- a/clang/include/clang/Options/Options.td
+++ b/clang/include/clang/Options/Options.td
@@ -4840,6 +4840,11 @@ def fmacro_prefix_map_EQ
   Visibility<[ClangOption, CLOption, CC1Option]>,
     HelpText<"remap file source paths in predefined preprocessor macros and "
              "__builtin_FILE(). Implies -ffile-reproducible.">;
+def fsanitize_prefix_map_EQ
+  : Joined<["-"], "fsanitize-prefix-map=">, Group<f_Group>,
+    Visibility<[ClangOption, CC1Option]>,
+    MetaVarName<"<old>=<new>">,
+    HelpText<"Remap file source paths in sanitizer metadata">;
 defm force_dwarf_frame : BoolFOption<"force-dwarf-frame",
   CodeGenOpts<"ForceDwarfFrameSection">, DefaultFalse,
   PosFlag<SetTrue, [], [ClangOption, CC1Option],
diff --git a/clang/lib/CodeGen/BackendUtil.cpp 
b/clang/lib/CodeGen/BackendUtil.cpp
index 94257fb96fc7f..e9240b4e1e5ef 100644
--- a/clang/lib/CodeGen/BackendUtil.cpp
+++ b/clang/lib/CodeGen/BackendUtil.cpp
@@ -745,6 +745,8 @@ static void addSanitizers(const Triple &TargetTriple,
         Opts.Recover = CodeGenOpts.SanitizeRecover.has(Mask);
         Opts.UseAfterScope = CodeGenOpts.SanitizeAddressUseAfterScope;
         Opts.UseAfterReturn = CodeGenOpts.getSanitizeAddressUseAfterReturn();
+        for (const auto &P : CodeGenOpts.SanitizePrefixMap)
+          Opts.PrefixMap.push_back({P.first, P.second});
         MPM.addPass(AddressSanitizerPass(Opts, UseGlobalGC, UseOdrIndicator,
                                          DestructorKind));
       }
diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index 091e9c87c8ad4..c31945db6285b 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -3963,6 +3963,16 @@ llvm::Constant 
*CodeGenFunction::EmitCheckSourceLocation(SourceLocation Loc) {
   if (PLoc.isValid()) {
     StringRef FilenameString = PLoc.getFilename();
 
+    // Apply sanitize prefix map.
+    std::string RemappedFilename;
+    for (const auto &[Old, New] : CGM.getCodeGenOpts().SanitizePrefixMap) {
+      if (FilenameString.starts_with(Old)) {
+        RemappedFilename = (New + FilenameString.substr(Old.size())).str();
+        FilenameString = RemappedFilename;
+        break;
+      }
+    }
+
     int PathComponentsToStrip =
         CGM.getCodeGenOpts().EmitCheckPathComponentsToStrip;
     if (PathComponentsToStrip < 0) {
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp 
b/clang/lib/Driver/ToolChains/Clang.cpp
index c16aa33f29ebb..5deff313db805 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -329,6 +329,21 @@ static void addCoveragePrefixMapArg(const Driver &D, const 
ArgList &Args,
   }
 }
 
+/// Add a CC1 option to specify the sanitizer file path prefix map.
+static void addSanitizePrefixMapArg(const Driver &D, const ArgList &Args,
+                                    ArgStringList &CmdArgs) {
+  for (const Arg *A : Args.filtered(options::OPT_ffile_prefix_map_EQ,
+                                    options::OPT_fsanitize_prefix_map_EQ)) {
+    StringRef Map = A->getValue();
+    if (!Map.contains('='))
+      D.Diag(diag::err_drv_invalid_argument_to_option)
+          << Map << A->getOption().getName();
+    else
+      CmdArgs.push_back(Args.MakeArgString("-fsanitize-prefix-map=" + Map));
+    A->claim();
+  }
+}
+
 /// Add -x lang to \p CmdArgs for \p Input.
 static void addDashXForInput(const ArgList &Args, const InputInfo &Input,
                              ArgStringList &CmdArgs) {
@@ -1175,6 +1190,7 @@ void Clang::AddPreprocessingOptions(Compilation &C, const 
JobAction &JA,
 
   addMacroPrefixMapArg(D, Args, CmdArgs);
   addCoveragePrefixMapArg(D, Args, CmdArgs);
+  addSanitizePrefixMapArg(D, Args, CmdArgs);
 
   Args.AddLastArg(CmdArgs, options::OPT_ffile_reproducible,
                   options::OPT_fno_file_reproducible);
diff --git a/clang/lib/Frontend/CompilerInvocation.cpp 
b/clang/lib/Frontend/CompilerInvocation.cpp
index 6aa2afb6f5918..3a5abd970e055 100644
--- a/clang/lib/Frontend/CompilerInvocation.cpp
+++ b/clang/lib/Frontend/CompilerInvocation.cpp
@@ -1586,6 +1586,10 @@ void CompilerInvocationBase::GenerateCodeGenArgs(const 
CodeGenOptions &Opts,
     GenerateArg(Consumer, OPT_fcoverage_prefix_map_EQ,
                 Prefix.first + "=" + Prefix.second);
 
+  for (const auto &Prefix : Opts.SanitizePrefixMap)
+    GenerateArg(Consumer, OPT_fsanitize_prefix_map_EQ,
+                Prefix.first + "=" + Prefix.second);
+
   if (Opts.NewStructPathTBAA)
     GenerateArg(Consumer, OPT_new_struct_path_tbaa);
 
@@ -1896,6 +1900,11 @@ bool CompilerInvocation::ParseCodeGenArgs(CodeGenOptions 
&Opts, ArgList &Args,
     Opts.CoveragePrefixMap.emplace_back(Split.first, Split.second);
   }
 
+  for (const auto &Arg : Args.getAllArgValues(OPT_fsanitize_prefix_map_EQ)) {
+    auto Split = StringRef(Arg).split('=');
+    Opts.SanitizePrefixMap.emplace_back(Split.first, Split.second);
+  }
+
   const llvm::Triple::ArchType DebugEntryValueArchs[] = {
       llvm::Triple::x86,     llvm::Triple::x86_64, llvm::Triple::aarch64,
       llvm::Triple::arm,     llvm::Triple::armeb,  llvm::Triple::mips,
diff --git a/clang/test/CodeGen/asan-prefix-map.cpp 
b/clang/test/CodeGen/asan-prefix-map.cpp
new file mode 100644
index 0000000000000..d657e68735bac
--- /dev/null
+++ b/clang/test/CodeGen/asan-prefix-map.cpp
@@ -0,0 +1,11 @@
+// RUN: %clang_cc1 %s -triple=x86_64-linux-gnu -emit-llvm -fsanitize=address 
-o - | FileCheck %s -check-prefix=REGULAR
+// RUN: %clang_cc1 %s -triple=x86_64-linux-gnu -emit-llvm -fsanitize=address 
-fsanitize-prefix-map=%S/= -o - | FileCheck %s -check-prefix=REMAPPED
+
+// REGULAR: @___asan_gen_module = private constant [{{[0-9]+}} x i8] 
c"{{.*test(.|\\\\)CodeGen(.|\\\\)asan-prefix-map\.cpp}}\00"
+// REMAPPED: @___asan_gen_module = private constant [{{[0-9]+}} x i8] 
c"asan-prefix-map.cpp\00"
+
+int global;
+
+void f() {
+  global = 1;
+}
diff --git a/clang/test/CodeGen/ubsan-prefix-map.cpp 
b/clang/test/CodeGen/ubsan-prefix-map.cpp
new file mode 100644
index 0000000000000..88351744cc61b
--- /dev/null
+++ b/clang/test/CodeGen/ubsan-prefix-map.cpp
@@ -0,0 +1,9 @@
+// RUN: %clang %s -target x86_64-linux-gnu -emit-llvm -S -fsanitize=undefined 
-o - | FileCheck %s -check-prefix=REGULAR
+// RUN: %clang %s -target x86_64-linux-gnu -emit-llvm -S -fsanitize=undefined 
-fsanitize-prefix-map=%S/= -o - | FileCheck %s -check-prefix=REMAPPED
+
+// REGULAR: @{{.*}} = {{.*}} 
c"{{.*test(.|\\\\)CodeGen(.|\\\\)ubsan-prefix-map\.cpp}}\00"
+// REMAPPED: @{{.*}} = {{.*}} c"ubsan-prefix-map.cpp\00"
+
+int f(int x, int y) {
+  return x / y;
+}
diff --git a/clang/test/Driver/fsanitize-prefix-map.cpp 
b/clang/test/Driver/fsanitize-prefix-map.cpp
new file mode 100644
index 0000000000000..ed1e38d6dbbea
--- /dev/null
+++ b/clang/test/Driver/fsanitize-prefix-map.cpp
@@ -0,0 +1,8 @@
+// RUN: %clang %s -### -o %t.o -fsanitize=address 
-fsanitize-prefix-map=/old=/new 2>&1 | FileCheck %s --check-prefix=SANITIZE
+// RUN: %clang %s -### -o %t.o -fsanitize=undefined 
-fsanitize-prefix-map=/old=/new 2>&1 | FileCheck %s --check-prefix=SANITIZE
+// RUN: %clang %s -### -o %t.o -fsanitize=address -ffile-prefix-map=/old=/new 
2>&1 | FileCheck %s --check-prefix=FILE
+// RUN: %clang %s -### -o %t.o -fsanitize=undefined 
-ffile-prefix-map=/old=/new 2>&1 | FileCheck %s --check-prefix=FILE
+// RUN: not %clang -### -fsanitize-prefix-map=old %s 2>&1 | FileCheck %s 
--check-prefix=INVALID
+// SANITIZE: "-fsanitize-prefix-map=/old=/new"
+// FILE: "-fsanitize-prefix-map=/old=/new"
+// INVALID: error: invalid argument 'old' to -fsanitize-prefix-map
diff --git a/llvm/include/llvm/Transforms/Instrumentation/AddressSanitizer.h 
b/llvm/include/llvm/Transforms/Instrumentation/AddressSanitizer.h
index 2a9fe91b32f3c..919b32a129a79 100644
--- a/llvm/include/llvm/Transforms/Instrumentation/AddressSanitizer.h
+++ b/llvm/include/llvm/Transforms/Instrumentation/AddressSanitizer.h
@@ -30,6 +30,7 @@ struct AddressSanitizerOptions {
   int InstrumentationWithCallsThreshold = 7000;
   uint32_t MaxInlinePoisoningSize = 64;
   bool InsertVersionCheck = true;
+  std::vector<std::pair<std::string, std::string>> PrefixMap;
 };
 
 /// Public interface to the address sanitizer module pass for instrumenting 
code
diff --git a/llvm/lib/Transforms/Instrumentation/AddressSanitizer.cpp 
b/llvm/lib/Transforms/Instrumentation/AddressSanitizer.cpp
index 2d4c4c9b057c6..7af4fcc8387c0 100644
--- a/llvm/lib/Transforms/Instrumentation/AddressSanitizer.cpp
+++ b/llvm/lib/Transforms/Instrumentation/AddressSanitizer.cpp
@@ -928,11 +928,13 @@ struct AddressSanitizer {
 
 class ModuleAddressSanitizer {
 public:
-  ModuleAddressSanitizer(Module &M, bool InsertVersionCheck,
-                         bool CompileKernel = false, bool Recover = false,
-                         bool UseGlobalsGC = true, bool UseOdrIndicator = true,
-                         AsanDtorKind DestructorKind = AsanDtorKind::Global,
-                         AsanCtorKind ConstructorKind = AsanCtorKind::Global)
+  ModuleAddressSanitizer(
+      Module &M, bool InsertVersionCheck, bool CompileKernel = false,
+      bool Recover = false, bool UseGlobalsGC = true,
+      bool UseOdrIndicator = true,
+      AsanDtorKind DestructorKind = AsanDtorKind::Global,
+      AsanCtorKind ConstructorKind = AsanCtorKind::Global,
+      std::vector<std::pair<std::string, std::string>> PrefixMap = {})
       : M(M),
         CompileKernel(ClEnableKasan.getNumOccurrences() > 0 ? ClEnableKasan
                                                             : CompileKernel),
@@ -959,7 +961,8 @@ class ModuleAddressSanitizer {
         DestructorKind(DestructorKind),
         ConstructorKind(ClConstructorKind.getNumOccurrences() > 0
                             ? ClConstructorKind
-                            : ConstructorKind) {
+                            : ConstructorKind),
+        PrefixMap(std::move(PrefixMap)) {
     C = &(M.getContext());
     int LongSize = M.getDataLayout().getPointerSizeInBits();
     IntptrTy = Type::getIntNTy(*C, LongSize);
@@ -1022,6 +1025,7 @@ class ModuleAddressSanitizer {
   bool UseCtorComdat;
   AsanDtorKind DestructorKind;
   AsanCtorKind ConstructorKind;
+  std::vector<std::pair<std::string, std::string>> PrefixMap;
   Type *IntptrTy;
   PointerType *PtrTy;
   LLVMContext *C;
@@ -1309,7 +1313,8 @@ PreservedAnalyses AddressSanitizerPass::run(Module &M,
 
   ModuleAddressSanitizer ModuleSanitizer(
       M, Options.InsertVersionCheck, Options.CompileKernel, Options.Recover,
-      UseGlobalGC, UseOdrIndicator, DestructorKind, ConstructorKind);
+      UseGlobalGC, UseOdrIndicator, DestructorKind, ConstructorKind,
+      Options.PrefixMap);
   bool Modified = false;
   auto &FAM = 
MAM.getResult<FunctionAnalysisManagerModuleProxy>(M).getManager();
   const StackSafetyGlobalInfo *const SSGI =
@@ -2808,9 +2813,19 @@ GlobalVariable 
*ModuleAddressSanitizer::getOrCreateModuleName() {
   if (!ModuleName) {
     // We shouldn't merge same module names, as this string serves as unique
     // module ID in runtime.
-    ModuleName =
-        createPrivateGlobalForString(M, M.getModuleIdentifier(),
-                                     /*AllowMerging*/ false, 
genName("module"));
+    std::string ModuleNameStr = M.getModuleIdentifier();
+
+    // Apply prefix map remapping.
+    for (const auto &[Old, New] : PrefixMap) {
+      if (StringRef(ModuleNameStr).starts_with(Old)) {
+        ModuleNameStr = New + ModuleNameStr.substr(Old.size());
+        break;
+      }
+    }
+
+    ModuleName = createPrivateGlobalForString(M, ModuleNameStr,
+                                              /*AllowMerging*/ false,
+                                              genName("module"));
   }
   return ModuleName;
 }

``````````

</details>


https://github.com/llvm/llvm-project/pull/186908
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to