CR1SceNT opened a new issue, #12921:
URL: https://github.com/apache/tvm/issues/12921

   I'm trying to use auto scheduler to tune a matmul_add function. The code 
works fine in Linux, but in Windows, it seems all the programs errored in 
measurement. 
   ```
   ----------------------------------------------------------------------
   ------------------------------  [ Search ]
   ----------------------------------------------------------------------
   Generate Sketches               #s: 3
   Sample Initial Population       #s: 905 fail_ct: 829    Time elapsed: 1.21
   GA Iter: 0      Max score: 0.9997       Min score: 0.8380       #Pop: 128    
   #M+: 0  #M-: 0
   GA Iter: 4      Max score: 0.9999       Min score: 0.9859       #Pop: 128    
   #M+: 1372       #M-: 82
   EvolutionarySearch              #s: 128 Time elapsed: 5.39
   ----------------------------------------------------------------------
   ------------------------------  [ Measure ]
   ----------------------------------------------------------------------
   Get 16 programs to measure:
   ................*E*E*E*E*E*E*E*E*E*E*E*E*E*E*E*E
   Time elapsed for measurement: 14.04 s
   ----------------------------------------------------------------------
   ------------------------------  [ Done ]
   ----------------------------------------------------------------------
   ```
   Code show as below:
   ```python
   import tvm
   from tvm import auto_scheduler, te
   
   @auto_scheduler.register_workload  # Note the auto_scheduler decorator
   def matmul_add(N, L, M, dtype):
       # A * B + C
       A = te.placeholder((N, L), name="A", dtype=dtype)
       B = te.placeholder((L, M), name="B", dtype=dtype)
       C = te.placeholder((N, M), name="C", dtype=dtype)
   
       k = te.reduce_axis((0, L), name="k")
       matmul = te.compute(
           (N, M),
           lambda i, j: te.sum(A[i, k] * B[k, j], axis=k),
           name="matmul",
           attrs={"layout_free_placeholders": [B]},  # enable automatic layout 
transform for tensor B
       )
       out = te.compute((N, M), lambda i, j: matmul[i, j] + C[i, j], name="out")
   
       return [A, B, C, out]
   
   target = tvm.target.Target("llvm -mcpu=core-avx2")
   
   N = L = M = 4
   task = auto_scheduler.SearchTask(
       func=matmul_add, args=(N, L, M, "float32"), target=target
   )
   print("Computational DAG:")
   print(task.compute_dag)
   
   log_file = "matmul_add.json"
   
   print("init cost model")
   cost_model = auto_scheduler.XGBModel()
   
   tune_option = auto_scheduler.TuningOptions(
       num_measure_trials=16,  # change this to 1000 to achieve the best 
performance
       runner=auto_scheduler.LocalRunner(repeat=10, 
enable_cpu_cache_flush=True),
       measure_callbacks=[auto_scheduler.RecordToFile(log_file)],
       verbose=1,
   )
   
   print("start tune")
   task.tune(tune_option)
   ```
   
   ### Environment
   
   OS: Windows 10. TVM version: 0.10.dev644+gfa5045bf6
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to