https://discuss.tvm.apache.org/t/vta-autotuning-from-tutorial-fails-with-one-pynq-but-succeeds-with-two-pynqs/4265/3?u=hht I find the workaround for autotuning with one PYNQ and locate the problem. In the VTA autotuning tutorial, there is a handle named `remote`.
The `remote` does two things. One is to program FPGA. ``` if env.TARGET != "sim": # Get remote from fleet node remote = autotvm.measure.request_remote( env.TARGET, tracker_host, tracker_port, timeout=10000 ) # Reconfigure the JIT runtime and FPGA. vta.reconfig_runtime(remote) vta.program_fpga(remote, bitstream=None) else: # In simulation mode, host the RPC server locally. remote = rpc.LocalSession() ``` Another is to run the whole net and give the result after autotuning. ``` # compile kernels with history best records with autotvm.tophub.context(target, extra_files=[log_file]): # Compile network print("Compile...") if target.device_name != "vta": with tvm.transform.PassContext(opt_level=3, disabled_pass={"AlterOpLayout"}): lib = relay.build( relay_prog, target=target, params=params, target_host=env.target_host ) else: with vta.build_config(opt_level=3, disabled_pass={"AlterOpLayout"}): lib = relay.build( relay_prog, target=target, params=params, target_host=env.target_host ) # Export library print("Upload...") temp = util.tempdir() lib.save(temp.relpath("graphlib.o")) remote.upload(temp.relpath("graphlib.o")) lib = remote.load_module("graphlib.o") # Generate the graph runtime ctx = remote.ext_dev(0) if device == "vta" else remote.cpu(0) m = graph_runtime.GraphModule(lib["default"](ctx)) # upload parameters to device image = tvm.nd.array((np.random.uniform(size=(1, 3, 224, 224))).astype("float32")) m.set_input("data", image) # evaluate print("Evaluate inference time cost...") timer = m.module.time_evaluator("run", ctx, number=1, repeat=10) tcost = timer() prof_res = np.array(tcost.results) * 1000 # convert to millisecond print( "Mean inference time (std dev): %.2f ms (%.2f ms)" % (np.mean(prof_res), np.std(prof_res)) ) ``` The `remote` occupies a device all the time but it play no role in autotuning. So my workaround is to comment out the code above to remove the `remote` and it works. ``` Extract tasks... Extracted 10 conv2d tasks: (1, 14, 14, 256, 512, 1, 1, 0, 0, 2, 2) (1, 28, 28, 128, 256, 1, 1, 0, 0, 2, 2) (1, 56, 56, 64, 128, 1, 1, 0, 0, 2, 2) (1, 56, 56, 64, 64, 3, 3, 1, 1, 1, 1) (1, 28, 28, 128, 128, 3, 3, 1, 1, 1, 1) (1, 56, 56, 64, 128, 3, 3, 1, 1, 2, 2) (1, 14, 14, 256, 256, 3, 3, 1, 1, 1, 1) (1, 28, 28, 128, 256, 3, 3, 1, 1, 2, 2) (1, 7, 7, 512, 512, 3, 3, 1, 1, 1, 1) (1, 14, 14, 256, 512, 3, 3, 1, 1, 2, 2) Tuning... [Task 1/10] Current/Best: 0.00/ 28.79 GFLOPS | Progress: (480/480) | 306.61 s Done. [Task 2/10] Current/Best: 0.00/ 31.41 GFLOPS | Progress: (576/576) | 389.47 s Done. [Task 3/10] Current/Best: 0.00/ 43.20 GFLOPS | Progress: (1000/1000) | 667.90 s Done. [Task 4/10] Current/Best: 0.00/ 46.37 GFLOPS | Progress: (1000/1000) | 564.08 s Done. [Task 5/10] Current/Best: 0.00/ 38.90 GFLOPS | Progress: (1000/1000) | 641.09 s Done. [Task 6/10] Current/Best: 0.00/ 44.39 GFLOPS | Progress: (1000/1000) | 560.03 s Done. [Task 7/10] Current/Best: 0.00/ 40.67 GFLOPS | Progress: (1000/1000) | 731.33 s Done. [Task 8/10] Current/Best: 0.00/ 9.58 GFLOPS | Progress: (1000/1000) | 1046.03 s Done. [Task 9/10] Current/Best: 0.00/ 12.51 GFLOPS | Progress: (1000/1000) | 1276.48 s Done. [Task 10/10] Current/Best: 0.31/ 11.95 GFLOPS | Progress: (480/480) | 619.91 s Done. ``` --- [Visit Topic](https://discuss.tvm.apache.org/t/vta-workaround-for-autotuning-with-one-pynq-z1-board/8091/1) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/94f47a9d4308101b7a5db3647e0b51485338189b3598d9a5306746634c3f8e1f).