masahi edited a comment on pull request #8381:
URL: https://github.com/apache/tvm/pull/8381#issuecomment-872891424


   Interesting! Since our `intel_graphics` work was done a couple of years ago, 
for the older iGPU, I didn't expect it to be relevant today. 
   
   1. I tested on following model/input, with Gen11, 1.1 fp32 peak tflops.
   ```
   mlperf ssd-resnet34: input shape (1, 3, 1200, 1200)
   DETR (https://github.com/facebookresearch/detr): input shape (1, 3, 750, 
800) 
   ```
   The result was compared against GTX 1070 ti, with peak 8 TFLOPS.
   All tuning was done with auto scheduler and NHWC layout.
   
   End to end result:
   mlperf ssd resnet 34
   ```
   gen11: 1116.40 ms
   1070 ti: 99.34 ms
   ```
   
   DETR
   ```
   gen11: 292.5 ms
   1070 ti: 33.1 ms
   ```
   I consider this results excellent for Gen11, given HW peak difference and 
the fact that auto scheduler was hyper optimized for CUDA + dGPU.
   
   2. I can certainly trying tuning with `intel_graphics`. Thanks for the 
suggestion!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to