remotego edited a comment on pull request #7:
URL: https://github.com/apache/incubator-tvm-vta/pull/7#issuecomment-621581807


   Thanks @huajsj @tmoreau89 for pointing out the potential performance issue 
for this fix.
   
   The root cause of the issue is because on line 124, we instruct the VTA to 
start calculation by writing to the VTA_START bit of control register. The FPGA 
will then respond by changing its VTA_COMPUTE_DONE bit of control register from 
'1' to '0', indicating a "busy" state of the VTA hardware. And VTA will change 
it back to '1' after the completion of the calculation.
   
   The original code created a racing condition here between the processor code 
and the FPGA hardware, by attempting to check the VTA_COMPUTE_DONE bit 
immediately after the issuing of "start" command. If we got a more powerful 
processor, the checking of VTA_COMPUTE_DONE bit will happen before FPGA could 
ever change it to '0'.
   
   As @tmoreau89 mentioned, we are using a pooling approach here. And in my 
opinion, a short sleep before checking is quite common in the pooling world. 
Here we proposed a 1us sleep, which translates to around ~250 clock cycles on 
FPGA, it would give FPGA sufficient time to change VTA_COMPUTE_DONE from '1' to 
'0'. As 250 cycles are way shorter than any meaningful calculations, I believe 
there should not be any performance impact.
   
   I agree with @tmoreau89 that interrupt is a more robust solution compare to 
pooling, but it is quite a huge task involving changes from both HW and driver 
archs. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to