comaniac opened a new pull request #4995: [AutoTVM] Avoid using RPC for 
LocalRunner
URL: https://github.com/apache/incubator-tvm/pull/4995
 
 
   ### Motivation and Summary
   
   `LocalRunner`, which measures the runtime of an op with a certain schedule 
config on the host machine directly, is one of the two runners in AutoTVM. 
However, `LocalRunner` was derived from `RPCRunner` and launched a local RPC 
server. The reason for this implementation was to have a unified interface and 
logic for both runners, but it introduces two problems:
   
   1. Overhead.
   Although local RPC session doesn't really have to send the built binary via 
network, it still has 1) the RPC connection overhead, and 2) an additional 
binary copy overhead (`RPCRunner` will upload the locally built binary to 
remote RPC server. In the local RPC case, it means a copy from `/tmp`to `pwd`). 
   
   2. Reliability.
   As many people have reported in the discuss, the local RPC connection may be 
dropped or unstable. One possible reason may come from the tornado package but 
it's hard to be identified and fixed. Anyway, when we are using `LocalRunner`, 
it doesn't make any sense to see an error or warning regrad to RPC connection.
   
   This PR refactors AutoTVM `Runner` and fully decouples `LocalRunner` and 
`RPCRunner`. In summary:
   * Now both `LocalRunner` and `RPCRunner` are derived from `Runner`.
   * Common logic and interfaces such as "prepare golden reference for 
correctness checking" and "run N configs in parallel" are lifted to base 
`Runner` class.
   * Each derived runner only needs to specify how to acquire TVM context 
(local or remote) and how to run oneconfig.
   * No change on the user interfaces and APIs.
   
   ### Evaluation
   
   Since user interface and underlying measurement remain the same, I just 
tested the first task extracted from ResNet-18 on CPU for evaluation. Both 
tests use `LocalRunner`.
   
   Without this PR:
   
   ```
   [Task  1/12]  Current/Best:   20.81/ 169.89 GFLOPS | Progress: (252/252) | 
556.76 s Done
   ```
   
   With this PR:
   
   ```
   [Task  1/12]  Current/Best:   17.89/ 150.35 GFLOPS | Progress: (252/252) | 
192.78 s Done
   ```
   
   While I think the performance difference should be due to the measurement 
error, we can clearly see that this PR is capable of reducing the tuning time.
   
   ### Issue
   
   For VTA using `RPCRunner`, it will reprogram FPGA every time before a 
measurement, but I am not sure if we still need this step for the real local 
runner. Please help review and provide your suggestions (cc @tmoreau89).
   
   @merrymercy @eqy @kevinthesun please help to review. Thanks.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to