The GitHub Actions job "tvm-bot" on tvm.git/main has succeeded. Run started by GitHub user cchung100m (triggered by cchung100m).
Head commit for run: 46a9a00c8e2ba2c69c539a46412bea0f7f5b9ac7 / Bohan Hou <[email protected]> [DOCS][TIRX] Add in-kernel profiling (CudaProfiler) tutorial (#19895) This adds an in-kernel profiling page to the TIRx native-basics CUDA section, documenting the existing `tvm.tirx.bench.CudaProfiler`. The page covers: - a minimal load / compute / store example using `start` / `end` / `finalize` markers and a user-supplied `uint64` buffer; - decoding the record buffer on the host and exporting a Perfetto trace via `export_to_perfetto_trace`; - the record/tag encoding and the device code each call lowers to (a `%globaltimer` read, a leader-only global store, and a block fence); - usage notes: one leader per `(block, group)`, buffer sizing, the 32-bit `%globaltimer` wrap, and the per-region cost. The example is tested end-to-end on a CUDA GPU (B200, sm_100). It is wired into the `native_basics.rst` toctree after "Compiling and inspecting". The FlashAttention-4 timeline screenshot is served from `tlc-pack/web-data` (`images/tirx/tirx_cudaprofiler_fa4.png`), matching the other tirx doc figures. Report URL: https://github.com/apache/tvm/actions/runs/28293033480 With regards, GitHub Actions via GitBox --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
