================
@@ -412,6 +412,74 @@ Example Usage
    __host__ __device__ int Four(void) __attribute__((weak, 
alias("_Z6__Fourv")));
    __host__ __device__ float Four(float f) __attribute__((weak, 
alias("_Z6__Fourf")));
 
+Profile Guided Optimization (PGO)
+=================================
+
+Clang supports Profile Guided Optimization (PGO) for HIP, enabling optimization
+of both host and device code based on runtime execution profiles.
+
+Workflow
+--------
+
+The PGO workflow consists of three phases:
+
+1. **Instrumented Build**: Compile with ``-fprofile-generate`` to create an
+   instrumented binary that collects execution profiles:
+
+   .. code-block:: shell
+
+      clang++ -O2 -fprofile-generate --offload-arch=gfx1200 -xhip app.hip -o 
app_instrumented
----------------
yxsamliu wrote:

No, device-only PGO via `-Xarch_device -fprofile-generate` does not work. Two 
things require host-side instrumentation: (1) Shadow variable registration - 
`createHIPDeviceVariableRegistration()` runs as part of the PGO instrumentation 
lowering pass and registers shadow variables with the HIP runtime via 
`__hipRegisterVar`; without this, the runtime cannot locate device profile data 
symbols. (2) Profile data collection - the instrumentation pass registers an 
atexit handler (`__llvm_profile_register_write_file_atexit`) that calls 
`__llvm_profile_write_file()` at program exit to write profile data to disk; 
without host instrumentation, no atexit handler is registered and no profile 
data gets written. Both host and device need `-fprofile-generate` for device 
PGO to function.


https://github.com/llvm/llvm-project/pull/177665
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to