[Apache TVM Discuss] [Development] [VTA] Workaround for Autotuning with One PYNQ Z1 Board

Hanting Huang via Apache TVM Discuss Tue, 06 Oct 2020 01:50:19 -0700


https://discuss.tvm.apache.org/t/vta-autotuning-from-tutorial-fails-with-one-pynq-but-succeeds-with-two-pynqs/4265/3?u=hht
I find the workaround for autotuning with one PYNQ and locate the problem.
In the VTA autotuning tutorial, there is a handle named `remote`.


The `remote` does two things. One is to program FPGA.
```
    if env.TARGET != "sim":
        # Get remote from fleet node
        remote = autotvm.measure.request_remote(
            env.TARGET, tracker_host, tracker_port, timeout=10000
        )
        # Reconfigure the JIT runtime and FPGA.
        vta.reconfig_runtime(remote)
        vta.program_fpga(remote, bitstream=None)
    else:
        # In simulation mode, host the RPC server locally.
        remote = rpc.LocalSession()
```
Another is to run the whole net and give the result after autotuning.
```
# compile kernels with history best records
    with autotvm.tophub.context(target, extra_files=[log_file]):
        # Compile network
        print("Compile...")
        if target.device_name != "vta":
            with tvm.transform.PassContext(opt_level=3, 
disabled_pass={"AlterOpLayout"}):
                lib = relay.build(
                    relay_prog, target=target, params=params, 
target_host=env.target_host
                )
        else:
            with vta.build_config(opt_level=3, disabled_pass={"AlterOpLayout"}):
                lib = relay.build(
                    relay_prog, target=target, params=params, 
target_host=env.target_host
                )

        # Export library
        print("Upload...")
        temp = util.tempdir()
        lib.save(temp.relpath("graphlib.o"))
        remote.upload(temp.relpath("graphlib.o"))
        lib = remote.load_module("graphlib.o")

        # Generate the graph runtime
        ctx = remote.ext_dev(0) if device == "vta" else remote.cpu(0)
        m = graph_runtime.GraphModule(lib["default"](ctx))

        # upload parameters to device
        image = tvm.nd.array((np.random.uniform(size=(1, 3, 224, 
224))).astype("float32"))
        m.set_input("data", image)

        # evaluate
        print("Evaluate inference time cost...")
        timer = m.module.time_evaluator("run", ctx, number=1, repeat=10)
        tcost = timer()
        prof_res = np.array(tcost.results) * 1000  # convert to millisecond
        print(
            "Mean inference time (std dev): %.2f ms (%.2f ms)"
            % (np.mean(prof_res), np.std(prof_res))
        )
```
The `remote` occupies a device all the time but it play no role in autotuning. 
So my workaround is to comment out the code above to remove the `remote` and it 
works.
```
Extract tasks...
Extracted 10 conv2d tasks:
(1, 14, 14, 256, 512, 1, 1, 0, 0, 2, 2)
(1, 28, 28, 128, 256, 1, 1, 0, 0, 2, 2)
(1, 56, 56, 64, 128, 1, 1, 0, 0, 2, 2)
(1, 56, 56, 64, 64, 3, 3, 1, 1, 1, 1)
(1, 28, 28, 128, 128, 3, 3, 1, 1, 1, 1)
(1, 56, 56, 64, 128, 3, 3, 1, 1, 2, 2)
(1, 14, 14, 256, 256, 3, 3, 1, 1, 1, 1)
(1, 28, 28, 128, 256, 3, 3, 1, 1, 2, 2)
(1, 7, 7, 512, 512, 3, 3, 1, 1, 1, 1)
(1, 14, 14, 256, 512, 3, 3, 1, 1, 2, 2)
Tuning...
[Task  1/10]  Current/Best:    0.00/  28.79 GFLOPS | Progress: (480/480) | 
306.61 s Done.
[Task  2/10]  Current/Best:    0.00/  31.41 GFLOPS | Progress: (576/576) | 
389.47 s Done.
[Task  3/10]  Current/Best:    0.00/  43.20 GFLOPS | Progress: (1000/1000) | 
667.90 s Done.
[Task  4/10]  Current/Best:    0.00/  46.37 GFLOPS | Progress: (1000/1000) | 
564.08 s Done.
[Task  5/10]  Current/Best:    0.00/  38.90 GFLOPS | Progress: (1000/1000) | 
641.09 s Done.
[Task  6/10]  Current/Best:    0.00/  44.39 GFLOPS | Progress: (1000/1000) | 
560.03 s Done.
[Task  7/10]  Current/Best:    0.00/  40.67 GFLOPS | Progress: (1000/1000) | 
731.33 s Done.
[Task  8/10]  Current/Best:    0.00/   9.58 GFLOPS | Progress: (1000/1000) | 
1046.03 s Done.
[Task  9/10]  Current/Best:    0.00/  12.51 GFLOPS | Progress: (1000/1000) | 
1276.48 s Done.
[Task 10/10]  Current/Best:    0.31/  11.95 GFLOPS | Progress: (480/480) | 
619.91 s Done.
```





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/vta-workaround-for-autotuning-with-one-pynq-z1-board/8091/1)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/94f47a9d4308101b7a5db3647e0b51485338189b3598d9a5306746634c3f8e1f).

[Apache TVM Discuss] [Development] [VTA] Workaround for Autotuning with One PYNQ Z1 Board

Reply via email to