I just build and install the latest TVM according to the instruction,
https://tvm.apache.org/docs/install/from_source.html.
It works well for CPU kernel; but when I tried for GPU, it does not work. Could
anyone help me?
Program:
```
import os, sys, time, argparse, tvm, json
from tvm import te
from tvm import meta_schedule as ms
from tvm.meta_schedule.runner.config import EvaluatorConfig
from tvm.script import tir as T
from typing import Tuple
from tvm import te
from tvm.te import create_prim_func
from tvm.target import Target
M, N, K = 64, 64, 64
def matmul(
n: int, m: int, k: int, in_dtype: str = "float32", out_dtype: str =
"float32"
) -> Tuple[te.Tensor, te.Tensor, te.Tensor]:
a = te.placeholder((n, k), name="A", dtype=in_dtype)
b = te.placeholder((k, m), name="B", dtype=in_dtype)
k = te.reduce_axis((0, k), name="k")
c = te.compute(
(n, m),
lambda i, j: te.sum(a[i, k].astype(out_dtype) * b[k,
j].astype(out_dtype), axis=[k]),
name="C",
)
return (a, b, c)
if __name__ == "__main__":
#target = tvm.target.Target(f"cuda -max_threads_per_block 1024
-max_shared_memory_per_block 49152")
target = tvm.target.Target({"kind": "cuda", "arch": "sm_70",
"max_threads_per_block": 1024, "max_shared_memory_per_block": 49152}) # V100
database = ms.tune_tir(
mod=create_prim_func(matmul(N, K, M, in_dtype="float32",
out_dtype="float32")),#Matmul,
target=target,
max_trials_global=128,
num_trials_per_iter=32,
work_dir="./",
runner=ms.runner.LocalRunner(
evaluator_config=EvaluatorConfig(
number=3,
enable_cpu_cache_flush=False,
)
),
cost_model=ms.cost_model.XGBModel(
extractor=ms.feature_extractor.PerStoreFeature(),
adaptive_training=False,
),
strategy=ms.search_strategy.EvolutionarySearch(),
)
```
Result / output:
```
2025-10-14 05:51:09 [INFO] Logging directory: ./logs
2025-10-14 05:51:16 [INFO] LocalBuilder: max_workers = 16
2025-10-14 05:51:19 [INFO] [task_scheduler.cc:168] Initializing Task #0: "main"
2025-10-14 05:51:21 [INFO] [task_scheduler.cc:329]
ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency
(us) | Trials | Done
-----------------------------------------------------------------------------------------------------
0 | main | 524288 | 1 | N/A | N/A |
N/A | 0 |
-----------------------------------------------------------------------------------------------------
Total trials: 0
Total latency (us): 0
2025-10-14 05:51:21 [INFO] [task_scheduler.cc:189] TaskScheduler picks Task #0:
"main"
2025-10-14 05:51:24 [INFO] [task_scheduler.cc:202] Sending 16 sample(s) to
builder
2025-10-14 05:51:32 [INFO] [task_scheduler.cc:204] Sending 16 sample(s) to
runner
2025-10-14 05:51:32 [INFO] [task_scheduler.cc:246] [Updated] Task #0: "main"
2025-10-14 05:51:32 [INFO] [task_scheduler.cc:329]
ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency
(us) | Trials | Done
-----------------------------------------------------------------------------------------------------
0 | main | 524288 | 1 | N/A | N/A |
N/A | 16 |
-----------------------------------------------------------------------------------------------------
Total trials: 0
Total latency (us): 0
2025-10-14 05:51:32 [INFO] [task_scheduler.cc:189] TaskScheduler picks Task #0:
"main"
2025-10-14 05:51:35 [INFO] [task_scheduler.cc:202] Sending 16 sample(s) to
builder
2025-10-14 05:51:43 [INFO] [task_scheduler.cc:204] Sending 16 sample(s) to
runner
2025-10-14 05:51:43 [INFO] [task_scheduler.cc:246] [Updated] Task #0: "main"
2025-10-14 05:51:43 [INFO] [task_scheduler.cc:329]
ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency
(us) | Trials | Done
-----------------------------------------------------------------------------------------------------
0 | main | 524288 | 1 | N/A | N/A |
N/A | 32 |
-----------------------------------------------------------------------------------------------------
Total trials: 0
Total latency (us): 0
2025-10-14 05:51:43 [INFO] [task_scheduler.cc:189] TaskScheduler picks Task #0:
"main"
2025-10-14 05:51:45 [INFO] [task_scheduler.cc:202] Sending 8 sample(s) to
builder
2025-10-14 05:51:52 [INFO] [task_scheduler.cc:204] Sending 8 sample(s) to runner
2025-10-14 05:51:52 [INFO] [task_scheduler.cc:246] [Updated] Task #0: "main"
2025-10-14 05:51:52 [INFO] [task_scheduler.cc:329]
ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency
(us) | Trials | Done
-----------------------------------------------------------------------------------------------------
0 | main | 524288 | 1 | N/A | N/A |
N/A | 40 |
-----------------------------------------------------------------------------------------------------
Total trials: 0
Total latency (us): 0
2025-10-14 05:51:52 [INFO] [task_scheduler.cc:269] Task #0 has finished.
Remaining task(s): 0
2025-10-14 05:51:52 [INFO] [task_scheduler.cc:329]
ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency
(us) | Trials | Done
-----------------------------------------------------------------------------------------------------
0 | main | 524288 | 1 | N/A | N/A |
N/A | 40 | Y
-----------------------------------------------------------------------------------------------------
Total trials: 0
Total latency (us): 0
```
---
[Visit
Topic](https://discuss.tvm.apache.org/t/meta-schedule-tuning-for-gpu-does-not-work/18682/1)
to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click
here](https://discuss.tvm.apache.org/email/unsubscribe/9c899f4083b816419f0b4d6ad7f6f8f22dc1bd6b157b17eadb88a5454637c190).