The GitHub Actions job "tvm-bot" on tvm.git/main has succeeded.
Run started by GitHub user cchung100m (triggered by cchung100m).

Head commit for run:
46a9a00c8e2ba2c69c539a46412bea0f7f5b9ac7 / Bohan Hou <[email protected]>
[DOCS][TIRX] Add in-kernel profiling (CudaProfiler) tutorial (#19895)

This adds an in-kernel profiling page to the TIRx native-basics CUDA
section,
documenting the existing `tvm.tirx.bench.CudaProfiler`.

The page covers:

- a minimal load / compute / store example using `start` / `end` /
`finalize`
  markers and a user-supplied `uint64` buffer;
- decoding the record buffer on the host and exporting a Perfetto trace
via
  `export_to_perfetto_trace`;
- the record/tag encoding and the device code each call lowers to (a
  `%globaltimer` read, a leader-only global store, and a block fence);
- usage notes: one leader per `(block, group)`, buffer sizing, the
32-bit
  `%globaltimer` wrap, and the per-region cost.

The example is tested end-to-end on a CUDA GPU (B200, sm_100). It is
wired into
the `native_basics.rst` toctree after "Compiling and inspecting". The
FlashAttention-4 timeline screenshot is served from `tlc-pack/web-data`
(`images/tirx/tirx_cudaprofiler_fa4.png`), matching the other tirx doc
figures.

Report URL: https://github.com/apache/tvm/actions/runs/28293033480

With regards,
GitHub Actions via GitBox


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to