echuraev opened a new issue, #15405: URL: https://github.com/apache/tvm/issues/15405
`non_max_suppression` works much faster on GraphExecutor in comparison with VirtualMachine. ### Expected behavior I suppose that the performance should be the same for VM and GE. ### Actual behavior On my CPU (Intel Core i7-7700K) `non_max_suppression` works 3 times slower on VM (1066.29 ms) vs GE (359.79 ms). Tried to analyze this problem by using VTune Amplifier. And saw that about 70% of the execution time some work was done in `lib.so` (the name of the compiled model).  In GE we don't have such overhead. ### Environment Linux OS, latest mainline. ### Steps to reproduce You can use the following script to reproduce this problem. I changed its extension to `.txt` because `.py` file cannot be uploaded to GitHub. [reproducer.txt](https://github.com/apache/tvm/files/12162179/reproducer.txt) On the top of the source code, you can change the value of variable `USE_VM` to specify if the layer should be inferred on VM or on GE. ### Triage Please refer to the list of label tags [here](https://github.com/apache/tvm/wiki/Issue-Triage-Labels) to find the relevant tags and add them below in a bullet format (example below). * flow:vm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
