tqchen opened a new pull request, #589:
URL: https://github.com/apache/tvm-ffi/pull/589

   ## Summary
   
   Adds three header-only macros (`TVM_FFI_COLD_CODE`, `TVM_FFI_PREDICT_FALSE`, 
`TVM_FFI_PREDICT_TRUE`) to `tvm/ffi/base_details.h` and applies them to a small 
audited set of error-only helpers across `libtvm_ffi.so` and the Cython 
extension. No CMake changes. Downstream consumers that include 
`tvm/ffi/base_details.h` get the macros automatically and can apply them to 
their own helpers (notably TVM).
   
   ## Why
   
   A binary-layout audit of `libtvm_ffi.so` found that internal error helpers 
(`ErrorBuilder` ctors/dtor) live in the middle of `.text`, interleaved with hot 
C ABI dispatch and container code. They only run on error / setup / teardown 
paths, so keeping them out of the hot instruction stream improves icache 
locality without changing behavior.
   
   ## What
   
   ```cpp
   // include/tvm/ffi/base_details.h
   
   #if defined(__GNUC__) || defined(__clang__)
   #define TVM_FFI_COLD_CODE [[gnu::cold]]
   #else
   #define TVM_FFI_COLD_CODE
   #endif
   
   #if defined(__GNUC__) || defined(__clang__)
   #define TVM_FFI_PREDICT_FALSE(cond) 
(__builtin_expect(static_cast<bool>(cond), 0))
   #define TVM_FFI_PREDICT_TRUE(cond)  
(__builtin_expect(static_cast<bool>(cond), 1))
   #else
   #define TVM_FFI_PREDICT_FALSE(cond) (cond)
   #define TVM_FFI_PREDICT_TRUE(cond)  (cond)
   #endif
   ```
   
   `TVM_FFI_COLD_CODE` is applied only to functions that run exclusively on 
error / segfault / process-startup paths — never on regular teardown:
   
   - `details::ErrorBuilder` ctors and the `[[noreturn]]` destructor
   - `TVMFFISegFaultHandler` (internal)
   - `TVMFFIInstallSignalHandler` (startup-only)
   - `TVMFFIPyCallManager::ForwardPyErrorToFFI` (Python error forwarding)
   
   `TVMFFIPyCallbackClosure::Deleter` is intentionally NOT cold — deleters run 
on every callback destruction, which is normal-lifecycle frequency.
   
   C ABI exports stay hot per cross-DSO surface hygiene. `TVMFFIError*` family, 
`TVMFFIBacktrace`, and `SafeCallContext` setter methods all remain in the hot 
region; callers and tools expect them as ordinary entry points, and once an 
error path enters them they should be fast (the TLS setter should not be 
size-optimized).
   
   `TVM_FFI_PREDICT_FALSE` is applied to the central choke points for error 
checking: `TVM_FFI_CHECK_SAFE_CALL`, `TVM_FFI_CHECK`, 
`GlobalFunctionTable::Update`'s already-registered branch, and ~17 error-check 
branches inside the Python→FFI dispatchers in `tvm_ffi_python_helpers.h`. 
`TVM_FFI_PREDICT_TRUE` is used once, on the dispatch-map cache-hit branch 
(warm-state every call but the first).
   
   ## Mechanism
   
   GCC and Clang emit cold-marked functions into per-TU `.text.unlikely` 
sections. The default GNU linker script's `*(.text.unlikely .text.*_unlikely 
.text.unlikely.*)` rule gathers them into a contiguous slot inside `.text`. No 
`-ffunction-sections` flag required — cold separation works with the default 
build. On MSVC the macros are no-ops and the code is byte-identical to before.
   
   ## Measured impact
   
   Stripped `libtvm_ffi.so`, Release / GCC 11.4 / ld.bfd 2.38 / x86_64 Linux:
   
   | Build                              | Stripped size | Delta       |
   |------------------------------------|--------------:|------------:|
   | baseline (both macros no-op)       |     1,887,800 |          —  |
   | cold attribute only                |     1,834,568 |  -53,232 B  |
   | predict macros only                |     1,908,280 |  +20,480 B  |
   | both (this PR)                     |     1,842,728 |  -45,072 B  |
   
   The size win is dominated by `[[gnu::cold]]` triggering size-optimizing 
codegen on cold function bodies. Branch-prediction macros are layout-only 
(slightly grow the binary by ~8 KB but improve hot-path basic-block contiguity).
   
   Cold cluster on `libtvm_ffi.so`: about 103 KiB at the head of `.text` (~7.3% 
of `.text`), all the error helpers plus compiler-emitted `.cold` split-bodies 
clustered together.
   
   Cython extension `core.abi3.so`: stripped size unchanged (page-alignment 
padding absorbs the +433 B `.text` delta). Cold cluster includes 
`ForwardPyErrorToFFI` and ~10 auto-cold `.cold` thunks from large Pyx wrappers.
   
   ## Performance
   
   `benchmark_dlpack.py` CPU-only subset, two trials each, median:
   
   | benchmark                          | baseline  | with markers | delta   |
   |------------------------------------|----------:|-------------:|--------:|
   | `nop(tvm_tensor x3)`               | 112.8 ns  |   112.2 ns   | -0.49%  |
   | `nop.autodlpack(torch[cpu])`       | 308.6 ns  |   303.4 ns   | -1.69%  |
   | `nop.autodlpack(numpy)`            | 939.5 ns  |   926.6 ns   | -1.37%  |
   | `nop+from_dlpack(torch)`           | 791.5 ns  |   787.0 ns   | -0.56%  |
   | `nop(int x3)`                      | 133.4 ns  |   133.3 ns   | -0.04%  |
   | `nop()`                            |  90.4 ns  |    89.2 ns   | -1.33%  |
   | `__dlpack__()`                     |  84.8 ns  |    84.4 ns   | -0.41%  |
   
   All within ±2% run-to-run noise. No regression.
   
   ## ABI / portability
   
   No ABI changes. The macros are header-only and the only observable 
difference is per-function attribute hints to the compiler. On MSVC every macro 
is a no-op (byte-identical codegen). On GCC and Clang, cold attribute lowers 
function-entry alignment and triggers `-Os`-style codegen on the marked body; 
branch-prediction macros only reorder basic blocks within the function.
   
   ## Test plan
   
   - [x] 355/355 active C++ tests pass.
   - [x] Python smoke test: `import tvm_ffi; print(__version__)` succeeds.
   - [x] `benchmark_dlpack.py` CPU subset shows no regression.
   - [x] Pre-commit clean.
   - [x] clang-tidy clean.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to