remotego edited a comment on pull request #7: URL: https://github.com/apache/incubator-tvm-vta/pull/7#issuecomment-621581807
Thanks @huajsj @tmoreau89 for pointing out the potential performance issue for this fix. The root cause of the issue is because on line 124, we instruct the VTA to start calculation by writing to the VTA_START bit of control register. The FPGA will then respond by changing its VTA_COMPUTE_DONE bit of control register from '1' to '0', indicating a "busy" state of the VTA hardware. And VTA will change it back to '1' after the completion of the calculation. The original code created a racing condition here between the processor code and the FPGA hardware, by attempting to check the VTA_COMPUTE_DONE bit immediately after the issuing of "start" command. If we got a more powerful processor, the checking of VTA_COMPUTE_DONE bit will happen before FPGA could ever change it to '0'. As @tmoreau89 mentioned, we are using a pooling approach here. And in my opinion, a short sleep before checking is quite common in the pooling world. Here we proposed a 1us sleep, which translates to around ~250 clock cycles on FPGA, it would give FPGA sufficient time to change VTA_COMPUTE_DONE from '1' to '0'. As 250 cycles are way shorter than any meaningful calculations, I believe there should not be any performance impact. I agree with @tmoreau89 that interrupt is a more robust solution compare to pooling, but it is quite a huge task involving changes from both HW and driver archs. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org