The GitHub Actions job "Teams" on tvm.git/main has succeeded.
Run started by GitHub user raindrops-0199 (triggered by raindrops-0199).

Head commit for run:
2a16d98b9ec1899f11c35a70c779285812d360c2 / Shushi Hong <[email protected]>
[S-TIR] Fix software pipeline offsets for legacy MMA intrinsics (#19742)

This pr fixes `InjectSoftwarePipeline` to rewrite opaque buffer offsets
for legacy PTX MMA intrinsics, including `ptx_ldmatrix_legacy`,
`ptx_mma_legacy`, `mma_store_legacy`, and `mma_fill_legacy`.

### Failure

`test_async_nested_pipeline_mma_gemm_ideal_annotation` failed with a
real numerical mismatch during the final GEMM result check:

```text
Mismatched elements: 6744298 / 16777216 (40.2%)
Max absolute difference: 6.706421
Max relative difference: 0.00653978
```

A diagnostic run with a fixed seed reproduced the same class of error:

```text
nested_full bad 6826830 / 16777216
max_abs 7.0405273
mean_abs 0.9874781
```

The adjacent simple pipeline test passed, and disabling async copy did
not change the mismatch, so this was not a cp.async lowering issue,
tolerance noise, or GPU flakiness.

### Root Cause

`InjectSoftwarePipeline` may add a leading version dimension to pipeline
buffers. Normal buffer loads/stores and newer opaque PTX intrinsics
already had their offsets rewritten to include the pipeline version
slot.

However, the legacy MMA intrinsics were not covered. In this test, the
warp buffers became multi-versioned, but the legacy `ldmatrix`/`mma`
offsets still pointed to the original slot. As a result, the second
`ldmatrix` stage overwrote the first warp fragment, and both MMA stages
read the same fragment. That skipped one K fragment and duplicated
another, producing the numerical mismatch above.

Report URL: https://github.com/apache/tvm/actions/runs/27399094142

With regards,
GitHub Actions via GitBox


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to