cbalint13 opened a new issue, #18224:
URL: https://github.com/apache/tvm/issues/18224

   
   Failures are encontered about  tuning a small example with tensorization.
   This also affects our work in 
[PR#18182](https://github.com/apache/tvm/pull/18182) where this issue was 
discovered.
   
   ----
   #### Reproducer
   -  Here is a standalone minimal program reproducing the issue: 
[x86-dense-relax-metaschedule.py](https://github.com/user-attachments/files/21943488/x86-dense-relax-metaschedule.py)
   - Swapping out L146 (enabling) and L151(disabling) make the program running, 
schedules are tensorized as expected.
   
   On **success**, the program will find and print the tensorized IR from the 
tuning database.
   On **failure** (with many obvious errors inside tuner) the program finds 
zero tensorized schedules.
   
   
   #### Descriptions
   * It seems to be a subtle difference in IR originating from 
```bb.emit_te(tvm.topi.nn.dense)``` vs ```bb.emit(relax.op.matmul```.
   
   #### Investigation
   * Investigatiated and found a very subtle difference, betweeen the good/bad 
relax IR:
   ```
   $ diff -Nru relax-ir-te.txt relax-ir-pp.txt 
   --- relax-ir-te.txt  2025-08-22 21:42:34.567064474 +0300
   +++ relax-ir-pp.txt  2025-08-22 21:46:34.309096420 +0300
   @@ -5,22 +5,22 @@
    @I.ir_module
    class Module:
        @T.prim_func(private=True)
   -    def dense1(data: T.Buffer((T.int64(4), T.int64(4)), "uint8"), weight: 
T.Buffer((T.int64(4), T.int64(4)), "int8"), T_matmul_NT: T.Buffer((T.int64(4), 
T.int64(4)), "int32")):
   +    def matmul1(data: T.Buffer((T.int64(4), T.int64(4)), "uint8"), weight: 
T.Buffer((T.int64(4), T.int64(4)), "int8"), matmul: T.Buffer((T.int64(4), 
T.int64(4)), "int32")):
            T.func_attr({"layout_free_buffers": [], "op_pattern": 4, 
"tir.noalias": True})
            # with T.block("root"):
            for i0, i1, k in T.grid(T.int64(4), T.int64(4), T.int64(4)):
   -            with T.block("T_matmul_NT"):
   +            with T.block("matmul"):
                    v_i0, v_i1, v_k = T.axis.remap("SSR", [i0, i1, k])
   -                T.reads(data[v_i0, v_k], weight[v_i1, v_k])
   -                T.writes(T_matmul_NT[v_i0, v_i1])
   +                T.reads(data[v_i0, v_k], weight[v_k, v_i1])
   +                T.writes(matmul[v_i0, v_i1])
                    with T.init():
   -                    T_matmul_NT[v_i0, v_i1] = 0
   -                T_matmul_NT[v_i0, v_i1] = T_matmul_NT[v_i0, v_i1] + 
T.Cast("int32", data[v_i0, v_k]) * T.Cast("int32", weight[v_i1, v_k])
   +                    matmul[v_i0, v_i1] = 0
   +                matmul[v_i0, v_i1] = matmul[v_i0, v_i1] + T.Cast("int32", 
data[v_i0, v_k]) * T.Cast("int32", weight[v_k, v_i1])
    
        @R.function
        def main(data: R.Tensor((4, 4), dtype="uint8"), weight: R.Tensor((4, 
4), dtype="int8")) -> R.Tensor((4, 4), dtype="int32"):
            cls = Module
            with R.dataflow():
   -            gv = R.call_tir(cls.dense1, (data, weight), 
out_sinfo=R.Tensor((4, 4), dtype="int32"))
   +            gv = R.call_tir(cls.matmul1, (data, weight), 
out_sinfo=R.Tensor((4, 4), dtype="int32"))
                R.output(gv)
            return gv
   ```
   
   I am seeing only naming differences, no structural** or other differences of 
IR
   Looking at the differences [(-)good, (+)bad] I cannot tell what is the issue.
   
   I incline that there is a missing pass, but which one ? Or is there a bug in 
TIR processing ?
   
   Cc @MasterJH5574 @Hzfengsy @mshr-h @tqchen 
   Cc @fzi-peccia 
   
   Thank you !
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to