ddwolf opened a new issue, #18344:
URL: https://github.com/apache/tvm/issues/18344

   ### Expected behavior
   
   using tune_tir to tune my own operator, which invoked a T.Macro decorated 
function several times, inside which there is a T.block("some_name").
   
   I want to get the tuned tir from tune_tir
   
   ### Actual behavior
   
   I am using tvm 0.21.0 tune_tir to tune my operator, which invokes another 
tvm.script.tir.Macro decorated function many times. The tune tir panicked like 
this:
   ```
   Duplicated block name  in function main not supported!
   ```
   
   IMHO, the callsite of invoke Macro will be replaced with the macro's 
content, so it truly is reasonable that the tvm will encounter duplicate block 
names if the macro has its own block.
   
   The question 1 is, should we consider all the occurence of the block name 
defined in macro the same? or tune each of them seperately?
   
   Question 2: is there a good way to bypass this problem now? 
   
   For me, I have to get rid of the block declaration in the macro, therefor 
allocate memories outside of the macro, which would be a lot convenient if do 
it inside of it. It's not a good idea I think.
   
   Another way to bypass it is that I transform the script tvm generated, 
modify each of the block with a unique name. Which requires me to modify the 
tvm source code itself. And too, this is also not a good idea.
   
   ### Environment
   
   - OS: Ubuntu 22.04
   - tvm: 0.21.0
   
   ### Steps to reproduce
   
   ##### this script will reproduce the issue
   ```python
   import tvm
   from tvm.script import tir as T
   from tvm import meta_schedule as ms
   import tempfile
   
   @T.macro
   def shared_computation():
       """T.Macro function containing a block"""
       with T.block("shared_block"):  # This block name will be duplicated
           T.evaluate(0)
   
   @T.prim_func
   def macro_bug_demo(a: T.handle, b: T.handle, c: T.handle) -> None:
       """Function with multiple T.Macro call sites"""
       A = T.match_buffer(a, [32, 32], dtype="float32")
       B = T.match_buffer(b, [32, 32], dtype="float32") 
       C = T.match_buffer(c, [32, 32], dtype="float32")
   
       # Initialization
       for i, j in T.grid(32, 32):
           with T.block("init"):
               vi, vj = T.axis.remap("SS", [i, j])
               C[vi, vj] = T.float32(0)
   
       # First call site
       shared_computation()
       
       # Some computation
       for i, j, k in T.grid(16, 32, 32):
           with T.block("matmul1"):
               vi, vj, vk = T.axis.remap("SSR", [i, j, k])
               C[vi, vj] = C[vi, vj] + A[vi, vk] * B[vk, vj]
       
       # Second call site - creates duplicate "shared_block" name
       shared_computation()
   
   # Trigger the bug
   with tempfile.TemporaryDirectory() as work_dir:
       ms.tir_integration.tune_tir(
           mod=macro_bug_demo,
           target="llvm -num-cores 1",
           work_dir=work_dir,
           max_trials_global=4,
       )
   ```
   
   ### Triage
   
   * tune:meta_schedule
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to