================
@@ -412,6 +412,74 @@ Example Usage
__host__ __device__ int Four(void) __attribute__((weak,
alias("_Z6__Fourv")));
__host__ __device__ float Four(float f) __attribute__((weak,
alias("_Z6__Fourf")));
+Profile Guided Optimization (PGO)
+=================================
+
+Clang supports Profile Guided Optimization (PGO) for HIP, enabling optimization
+of both host and device code based on runtime execution profiles.
+
+Workflow
+--------
+
+The PGO workflow consists of three phases:
+
+1. **Instrumented Build**: Compile with ``-fprofile-generate`` to create an
+ instrumented binary that collects execution profiles:
+
+ .. code-block:: shell
+
+ clang++ -O2 -fprofile-generate --offload-arch=gfx1200 -xhip app.hip -o
app_instrumented
----------------
yxsamliu wrote:
No, device-only PGO via `-Xarch_device -fprofile-generate` does not work. Two
things require host-side instrumentation: (1) Shadow variable registration -
`createHIPDeviceVariableRegistration()` runs as part of the PGO instrumentation
lowering pass and registers shadow variables with the HIP runtime via
`__hipRegisterVar`; without this, the runtime cannot locate device profile data
symbols. (2) Profile data collection - the instrumentation pass registers an
atexit handler (`__llvm_profile_register_write_file_atexit`) that calls
`__llvm_profile_write_file()` at program exit to write profile data to disk;
without host instrumentation, no atexit handler is registered and no profile
data gets written. Both host and device need `-fprofile-generate` for device
PGO to function.
https://github.com/llvm/llvm-project/pull/177665
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits