comaniac opened a new pull request #4995: [AutoTVM] Avoid using RPC for LocalRunner URL: https://github.com/apache/incubator-tvm/pull/4995 ### Motivation and Summary `LocalRunner`, which measures the runtime of an op with a certain schedule config on the host machine directly, is one of the two runners in AutoTVM. However, `LocalRunner` was derived from `RPCRunner` and launched a local RPC server. The reason for this implementation was to have a unified interface and logic for both runners, but it introduces two problems: 1. Overhead. Although local RPC session doesn't really have to send the built binary via network, it still has 1) the RPC connection overhead, and 2) an additional binary copy overhead (`RPCRunner` will upload the locally built binary to remote RPC server. In the local RPC case, it means a copy from `/tmp`to `pwd`). 2. Reliability. As many people have reported in the discuss, the local RPC connection may be dropped or unstable. One possible reason may come from the tornado package but it's hard to be identified and fixed. Anyway, when we are using `LocalRunner`, it doesn't make any sense to see an error or warning regrad to RPC connection. This PR refactors AutoTVM `Runner` and fully decouples `LocalRunner` and `RPCRunner`. In summary: * Now both `LocalRunner` and `RPCRunner` are derived from `Runner`. * Common logic and interfaces such as "prepare golden reference for correctness checking" and "run N configs in parallel" are lifted to base `Runner` class. * Each derived runner only needs to specify how to acquire TVM context (local or remote) and how to run oneconfig. * No change on the user interfaces and APIs. ### Evaluation Since user interface and underlying measurement remain the same, I just tested the first task extracted from ResNet-18 on CPU for evaluation. Both tests use `LocalRunner`. Without this PR: ``` [Task 1/12] Current/Best: 20.81/ 169.89 GFLOPS | Progress: (252/252) | 556.76 s Done ``` With this PR: ``` [Task 1/12] Current/Best: 17.89/ 150.35 GFLOPS | Progress: (252/252) | 192.78 s Done ``` While I think the performance difference should be due to the measurement error, we can clearly see that this PR is capable of reducing the tuning time. ### Issue For VTA using `RPCRunner`, it will reprogram FPGA every time before a measurement, but I am not sure if we still need this step for the real local runner. Please help review and provide your suggestions (cc @tmoreau89). @merrymercy @eqy @kevinthesun please help to review. Thanks.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
