echuraev opened a new issue, #15405:
URL: https://github.com/apache/tvm/issues/15405

   `non_max_suppression` works much faster on GraphExecutor in comparison with 
VirtualMachine.
   
   ### Expected behavior
   I suppose that the performance should be the same for VM and GE.
   
   ### Actual behavior
   On my CPU (Intel Core i7-7700K) `non_max_suppression` works 3 times slower 
on VM (1066.29 ms) vs GE (359.79 ms).
   Tried to analyze this problem by using VTune Amplifier. And saw that about 
70% of the execution time some work was done in `lib.so` (the name of the 
compiled model).
   
![image](https://github.com/apache/tvm/assets/5525113/b919c6ab-2ea2-4a89-aa9f-08697a11a491)
   
   In GE we don't have such overhead.
   
   ### Environment
   Linux OS, latest mainline.
   
   ### Steps to reproduce
   You can use the following script to reproduce this problem. I changed its 
extension to `.txt` because `.py` file cannot be uploaded to GitHub.
   [reproducer.txt](https://github.com/apache/tvm/files/12162179/reproducer.txt)
   
   On the top of the source code, you can change the value of variable `USE_VM` 
to specify if the layer should be inferred on VM or on GE.
   
   ### Triage
   
   Please refer to the list of label tags 
[here](https://github.com/apache/tvm/wiki/Issue-Triage-Labels) to find the 
relevant tags and add them below in a bullet format (example below).
   
   * flow:vm


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to