cbalint13 opened a new issue, #18224: URL: https://github.com/apache/tvm/issues/18224
Failures are encontered about tuning a small example with tensorization. This also affects our work in [PR#18182](https://github.com/apache/tvm/pull/18182) where this issue was discovered. ---- #### Reproducer - Here is a standalone minimal program reproducing the issue: [x86-dense-relax-metaschedule.py](https://github.com/user-attachments/files/21943488/x86-dense-relax-metaschedule.py) - Swapping out L146 (enabling) and L151(disabling) make the program running, schedules are tensorized as expected. On **success**, the program will find and print the tensorized IR from the tuning database. On **failure** (with many obvious errors inside tuner) the program finds zero tensorized schedules. #### Descriptions * It seems to be a subtle difference in IR originating from ```bb.emit_te(tvm.topi.nn.dense)``` vs ```bb.emit(relax.op.matmul```. #### Investigation * Investigatiated and found a very subtle difference, betweeen the good/bad relax IR: ``` $ diff -Nru relax-ir-te.txt relax-ir-pp.txt --- relax-ir-te.txt 2025-08-22 21:42:34.567064474 +0300 +++ relax-ir-pp.txt 2025-08-22 21:46:34.309096420 +0300 @@ -5,22 +5,22 @@ @I.ir_module class Module: @T.prim_func(private=True) - def dense1(data: T.Buffer((T.int64(4), T.int64(4)), "uint8"), weight: T.Buffer((T.int64(4), T.int64(4)), "int8"), T_matmul_NT: T.Buffer((T.int64(4), T.int64(4)), "int32")): + def matmul1(data: T.Buffer((T.int64(4), T.int64(4)), "uint8"), weight: T.Buffer((T.int64(4), T.int64(4)), "int8"), matmul: T.Buffer((T.int64(4), T.int64(4)), "int32")): T.func_attr({"layout_free_buffers": [], "op_pattern": 4, "tir.noalias": True}) # with T.block("root"): for i0, i1, k in T.grid(T.int64(4), T.int64(4), T.int64(4)): - with T.block("T_matmul_NT"): + with T.block("matmul"): v_i0, v_i1, v_k = T.axis.remap("SSR", [i0, i1, k]) - T.reads(data[v_i0, v_k], weight[v_i1, v_k]) - T.writes(T_matmul_NT[v_i0, v_i1]) + T.reads(data[v_i0, v_k], weight[v_k, v_i1]) + T.writes(matmul[v_i0, v_i1]) with T.init(): - T_matmul_NT[v_i0, v_i1] = 0 - T_matmul_NT[v_i0, v_i1] = T_matmul_NT[v_i0, v_i1] + T.Cast("int32", data[v_i0, v_k]) * T.Cast("int32", weight[v_i1, v_k]) + matmul[v_i0, v_i1] = 0 + matmul[v_i0, v_i1] = matmul[v_i0, v_i1] + T.Cast("int32", data[v_i0, v_k]) * T.Cast("int32", weight[v_k, v_i1]) @R.function def main(data: R.Tensor((4, 4), dtype="uint8"), weight: R.Tensor((4, 4), dtype="int8")) -> R.Tensor((4, 4), dtype="int32"): cls = Module with R.dataflow(): - gv = R.call_tir(cls.dense1, (data, weight), out_sinfo=R.Tensor((4, 4), dtype="int32")) + gv = R.call_tir(cls.matmul1, (data, weight), out_sinfo=R.Tensor((4, 4), dtype="int32")) R.output(gv) return gv ``` I am seeing only naming differences, no structural** or other differences of IR Looking at the differences [(-)good, (+)bad] I cannot tell what is the issue. I incline that there is a missing pass, but which one ? Or is there a bug in TIR processing ? Cc @MasterJH5574 @Hzfengsy @mshr-h @tqchen Cc @fzi-peccia Thank you ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
