Hi, as a new user I have some questions about using VTA in simulation and RPC 
server mode:

1. Are fully connected layers (and non-quantized convolutional layers) executed 
by target CPU (ARM CPU of the board) ? Or by host CPU (x86 CPU of my computer) ?

2. What is measured exactly when using VTA in tsim with the timer() function: 
Only part offloaded to VTA or also layers executed by target ARM CPU ? It is 
related to question 1.

3. The value returned by timer() function when I execute the MxNet tutorial 
(https://tvm.apache.org/docs/vta/tutorials/frontend/deploy_classification.html#sphx-glr-vta-tutorials-frontend-deploy-classification-py)
 in tsim is about 90 seconds! Why is it so far from the results in the 
publication?

4. How to interpret the simulation stats in tsim (cycle_count)? and in fsim 
(inp_load_nbytes, etc...)?

5. Is it possible to measure execution time layer by layer to identify a 
bottleneck in the neural network ?

Thanks in advance :smiley:





---
[Visit Topic](https://discuss.tvm.ai/t/vta-inference-questions/7738/1) to 
respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/0e3a29b37b3b77c43ca202cab6f434d5d35d6df923e75ac08d54e67f64f717f3).

Reply via email to