tkonolige opened a new issue #6096:
URL: https://github.com/apache/incubator-tvm/issues/6096


   I'm running into a weird bug where `apply_history_best` doesn't work unless 
`extract_from_program` has been called somewhere before in the program (given 
the log file already exists).
   
   Here is a minimal example:
   ```python
   import os
   import sys
   
   import numpy as np
   import tvm
   from tvm import te
   from tvm import autotvm
   from tvm import relay
   import tvm.relay.testing
   from tvm.autotvm.tuner import XGBTuner, GATuner, RandomTuner, GridSearchTuner
   from tvm.contrib.util import tempdir
   import tvm.contrib.graph_runtime as runtime
   
   input_shape = (1, 3, 224, 224)
   output_shape = (1, 1000)
   
   def run(mod, params, ctx):
       lib = relay.build(mod, target="llvm", params=params)
   
       dummy_data = np.random.uniform(size=input_shape).astype("float32")
   
       m = runtime.GraphModule(lib['default'](ctx))
       m.set_input('data', dummy_data)
       m.run()
       tvm_output = m.get_output(0)
   
       ftimer = m.module.time_evaluator("run", ctx, repeat=5, number=5)
       prof_res = np.array(ftimer().results) * 1000
       print(
           "%-20s %-19s (%s)"
           % ("Runtime:", "%.2f ms" % np.mean(prof_res), "%.2f ms" % 
np.std(prof_res))
       )
   
   
   if __name__ == "__main__":
       mod, params = tvm.relay.testing.resnet.get_workload()
   
       ctx = tvm.cpu()
       print("Untuned")
       run(mod, params, ctx)
   
       log_filename = "bug_tuning.log"
       tmp_log_file = log_filename + ".tmp"
       if os.path.exists(tmp_log_file):
           os.remove(tmp_log_file)
   
       if sys.argv[1] == "tune" or sys.argv[1] == "extract":
           tasks = autotvm.task.extract_from_program(mod["main"], target="llvm",
                                                     params=params,
                                                     )
   
           if sys.argv[1] == "tune":
               for i, task in enumerate(tasks[0:4]):
                   prefix = "[Task %2d/%2d] " % (i+1, len(tasks))
                   tuner_obj = GridSearchTuner(task)
   
                   # do tuning
                   measure_option = autotvm.measure_option(
                           builder=autotvm.LocalBuilder(),
                           runner=autotvm.LocalRunner(number=10, repeat=1,
                                                      min_repeat_ms=1000),
                       )
                   n_trial= min(len(task.config_space), 10)
                   tuner_obj.tune(n_trial=n_trial,
                                  early_stopping=False,
                                  measure_option=measure_option,
                                  callbacks=[
                                      autotvm.callback.progress_bar(n_trial, 
prefix=prefix),
                                      
autotvm.callback.log_to_file(tmp_log_file)])
   
               # pick best records to a cache file
               autotvm.record.pick_best(tmp_log_file, log_filename)
               os.remove(tmp_log_file)
   
       print("Tuned")
       with autotvm.apply_history_best(log_filename):
           run(mod, params, ctx)
   ```
   
   Run it like so:
   ```bash
   python3 example.py tune
   python3 example.py extract
   python3 example.py run
   ```
   
   On my laptop I get the following output:
   ```
   $ python3 simple_bug.py tune
   Untuned
   Cannot find config for target=llvm -keys=cpu, workload=('dense_nopack.x86', 
('TENSOR', (1, 512), 'float32'), ('TENSOR', (1000, 512), 'float32'), None, 
'float32'). A fallback configuration is used, which may bring great performance 
regression.
   Runtime:             41.48 ms            (2.09 ms)
   [Task  1/13]  Current/Best:   17.93/  34.07 GFLOPS | Progress: (10/10) | 
27.28 s Done.
   [Task  2/13]  Current/Best:    7.23/   7.24 GFLOPS | Progress: (10/10) | 
18.49 s Done.
   [Task  3/13]  Current/Best:   14.49/  14.49 GFLOPS | Progress: (10/10) | 
22.80 s Done.
   [Task  4/13]  Current/Best:   14.46/  14.46 GFLOPS | Progress: (10/10) | 
18.14 s Done.
   Tuned
   Runtime:             178.57 ms           (1.56 ms)
   
   $ python3 simple_bug.py extract
   Untuned
   Cannot find config for target=llvm -keys=cpu, workload=('dense_nopack.x86', 
('TENSOR', (1, 512), 'float32'), ('TENSOR', (1000, 512), 'float32'), None, 
'float32'). A fallback configuration is used, which may bring great performance 
regression.
   Runtime:             40.87 ms            (0.60 ms)
   Tuned
   Runtime:             178.05 ms           (1.02 ms)
   
   $ python3 simple_bug.py run
   Untuned
   Cannot find config for target=llvm -keys=cpu, workload=('dense_nopack.x86', 
('TENSOR', (1, 512), 'float32'), ('TENSOR', (1000, 512), 'float32'), None, 
'float32'). A fallback configuration is used, which may bring great performance 
regression.
   Runtime:             40.37 ms            (0.75 ms)
   Tuned
   Runtime:             39.85 ms            (0.69 ms)
   ```
   
   Ignore the fact that tuning makes results worse. I only run a couple 
iterations of the tuner to make the program run quickly.
   
   On both `tune` and `extract`, the tuned performance is different from the 
untuned performance indicating that a tuned schedule was used. But on `run`, 
performance is the same between tuned and untuned. The only difference between 
`extract` and `run` is the call to `autotvm.task.extract_from_program`. I'm not 
clear what exactly the cause of this performance discrepancy is.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to