The GitHub Actions job "tvm-bot" on tvm.git/main has succeeded.
Run started by GitHub user Aharrypotter (triggered by Aharrypotter).

Head commit for run:
446bd2dbf0f480718ad7d8ad64f59ebb9aa9c4cf / Soowon Jeong <[email protected]>
[BugFix][S-TIR] Wrap bare scalar bodies in DefaultGPUSchedule to avoid 
root-block crash (#19514)

## Problem

Closes #17873.

`DefaultGPUSchedule` crashes when a PrimFunc body is a bare
`SBlockRealize` (a fully-scalar op with no enclosing loops and no iter
vars):

```
ValueError: Check failed: (sref->parent != nullptr) is false:
  Cannot add loops on top of the root block
```

Minimal repro (TVMScript decorators are omitted in this snippet to
satisfy the PR-body lint; the regression test uses the regular
`T.prim_func` form):

```
ir_module:
  prim_func main(a: Buffer((), "float32"),
                 b: Buffer((), "float32"),
                 c: Buffer((), "float32")):
      func_attr({"target": target("nvidia/geforce-rtx-3080")})
      with sblock("scalar_add"):
          c[()] = a[()] + b[()]

s_tir.transform.DefaultGPUSchedule()(M)  # crashes
```

## Root Cause

The realized `scalar_add` block is itself the prim_func body's root
sref — it has no parent stmt to mutate. `ThreadBind`
(`src/s_tir/transform/default_gpu_schedule.cc`) reaches the
`loops.empty()` branch and calls `sch->AddUnitLoop(block)`, which fails
the `sref->parent != nullptr` check in `s_tir::AddUnitLoop`
(`src/s_tir/schedule/primitive/loop_transformation.cc:1166`).

The schedule infrastructure additionally requires the prim_func body
to be an `SBlockRealize` whose block is the function's root
(`GetRootPrimFunc` in `src/s_tir/schedule/analysis/analysis.cc:53`),
so the body cannot simply be wrapped in a top-level `For`.

## Fix

Before constructing the schedule, rewrite GPU-bound PrimFuncs whose
body is a bare-leaf `SBlockRealize` so the realized block is no longer
the root. The wrap conditions are intentionally narrow:

1. `func->body` is `SBlockRealize`,
2. the realized block has empty `iter_vars`, and
3. the block's body is not `For` or `SBlockRealize` (i.e. it is a leaf
   computation, not the well-formed implicit root that wraps a loop
   nest produced by the rest of the pipeline).

When all three hold, the body becomes:

```
SBlockRealize(
  block=SBlock("root", body=
    For(u, 0, 1, kSerial,
      SBlockRealize(iter_values=[u],
        block=<original block, iter_vars=[IterVar(0..1, vu, kDataPar)]>))))
```

The synthesised 1-extent data-parallel iter keeps
`iter_values.size() == iter_vars.size()` for downstream checks, and the
new For loop gives `ThreadBind` a real loop to bind to `blockIdx.x` /
`threadIdx.x`. Already-scheduled functions and host-only PrimFuncs are
skipped via the existing `IsScheduledOnGPU` / `kIsScheduled` gating.

## Testing

```
pytest tests/python/s_tir/transform/test_s_tir_transform_default_gpu_schedule.py
```

10 passed (9 existing + 1 new `test_scalar_block_no_loops`). End-to-end
compile + execute on RTX 3080 (sm_86): the scalar repro returns the
expected `2.0 + 3.0 = 5.0`.

Report URL: https://github.com/apache/tvm/actions/runs/25453061409

With regards,
GitHub Actions via GitBox


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to