tkonolige opened a new issue #6096:
URL: https://github.com/apache/incubator-tvm/issues/6096
I'm running into a weird bug where `apply_history_best` doesn't work unless
`extract_from_program` has been called somewhere before in the program (given
the log file already exists).
Here is a minimal example:
```python
import os
import sys
import numpy as np
import tvm
from tvm import te
from tvm import autotvm
from tvm import relay
import tvm.relay.testing
from tvm.autotvm.tuner import XGBTuner, GATuner, RandomTuner, GridSearchTuner
from tvm.contrib.util import tempdir
import tvm.contrib.graph_runtime as runtime
input_shape = (1, 3, 224, 224)
output_shape = (1, 1000)
def run(mod, params, ctx):
lib = relay.build(mod, target="llvm", params=params)
dummy_data = np.random.uniform(size=input_shape).astype("float32")
m = runtime.GraphModule(lib['default'](ctx))
m.set_input('data', dummy_data)
m.run()
tvm_output = m.get_output(0)
ftimer = m.module.time_evaluator("run", ctx, repeat=5, number=5)
prof_res = np.array(ftimer().results) * 1000
print(
"%-20s %-19s (%s)"
% ("Runtime:", "%.2f ms" % np.mean(prof_res), "%.2f ms" %
np.std(prof_res))
)
if __name__ == "__main__":
mod, params = tvm.relay.testing.resnet.get_workload()
ctx = tvm.cpu()
print("Untuned")
run(mod, params, ctx)
log_filename = "bug_tuning.log"
tmp_log_file = log_filename + ".tmp"
if os.path.exists(tmp_log_file):
os.remove(tmp_log_file)
if sys.argv[1] == "tune" or sys.argv[1] == "extract":
tasks = autotvm.task.extract_from_program(mod["main"], target="llvm",
params=params,
)
if sys.argv[1] == "tune":
for i, task in enumerate(tasks[0:4]):
prefix = "[Task %2d/%2d] " % (i+1, len(tasks))
tuner_obj = GridSearchTuner(task)
# do tuning
measure_option = autotvm.measure_option(
builder=autotvm.LocalBuilder(),
runner=autotvm.LocalRunner(number=10, repeat=1,
min_repeat_ms=1000),
)
n_trial= min(len(task.config_space), 10)
tuner_obj.tune(n_trial=n_trial,
early_stopping=False,
measure_option=measure_option,
callbacks=[
autotvm.callback.progress_bar(n_trial,
prefix=prefix),
autotvm.callback.log_to_file(tmp_log_file)])
# pick best records to a cache file
autotvm.record.pick_best(tmp_log_file, log_filename)
os.remove(tmp_log_file)
print("Tuned")
with autotvm.apply_history_best(log_filename):
run(mod, params, ctx)
```
Run it like so:
```bash
python3 example.py tune
python3 example.py extract
python3 example.py run
```
On my laptop I get the following output:
```
$ python3 simple_bug.py tune
Untuned
Cannot find config for target=llvm -keys=cpu, workload=('dense_nopack.x86',
('TENSOR', (1, 512), 'float32'), ('TENSOR', (1000, 512), 'float32'), None,
'float32'). A fallback configuration is used, which may bring great performance
regression.
Runtime: 41.48 ms (2.09 ms)
[Task 1/13] Current/Best: 17.93/ 34.07 GFLOPS | Progress: (10/10) |
27.28 s Done.
[Task 2/13] Current/Best: 7.23/ 7.24 GFLOPS | Progress: (10/10) |
18.49 s Done.
[Task 3/13] Current/Best: 14.49/ 14.49 GFLOPS | Progress: (10/10) |
22.80 s Done.
[Task 4/13] Current/Best: 14.46/ 14.46 GFLOPS | Progress: (10/10) |
18.14 s Done.
Tuned
Runtime: 178.57 ms (1.56 ms)
$ python3 simple_bug.py extract
Untuned
Cannot find config for target=llvm -keys=cpu, workload=('dense_nopack.x86',
('TENSOR', (1, 512), 'float32'), ('TENSOR', (1000, 512), 'float32'), None,
'float32'). A fallback configuration is used, which may bring great performance
regression.
Runtime: 40.87 ms (0.60 ms)
Tuned
Runtime: 178.05 ms (1.02 ms)
$ python3 simple_bug.py run
Untuned
Cannot find config for target=llvm -keys=cpu, workload=('dense_nopack.x86',
('TENSOR', (1, 512), 'float32'), ('TENSOR', (1000, 512), 'float32'), None,
'float32'). A fallback configuration is used, which may bring great performance
regression.
Runtime: 40.37 ms (0.75 ms)
Tuned
Runtime: 39.85 ms (0.69 ms)
```
Ignore the fact that tuning makes results worse. I only run a couple
iterations of the tuner to make the program run quickly.
On both `tune` and `extract`, the tuned performance is different from the
untuned performance indicating that a tuned schedule was used. But on `run`,
performance is the same between tuned and untuned. The only difference between
`extract` and `run` is the call to `autotvm.task.extract_from_program`. I'm not
clear what exactly the cause of this performance discrepancy is.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]