masahi edited a comment on pull request #8381: URL: https://github.com/apache/tvm/pull/8381#issuecomment-872891424
Interesting! Since our `intel_graphics` work was done a couple of years ago, for the older iGPU, I didn't expect it to be relevant today. 1. I tested on following model/input, with Gen11, 1.1 fp32 peak tflops. ``` mlperf ssd-resnet34: input shape (1, 3, 1200, 1200) DETR (https://github.com/facebookresearch/detr): input shape (1, 3, 750, 800) ``` The result was compared against GTX 1070 ti, with peak 8 TFLOPS. All tuning was done with auto scheduler and NHWC layout. End to end result: mlperf ssd resnet 34 ``` gen11: 1116.40 ms 1070 ti: 99.34 ms ``` DETR ``` gen11: 292.5 ms 1070 ti: 33.1 ms ``` I consider this results excellent for Gen11, given HW peak difference and the fact that auto scheduler was hyper optimized for CUDA + dGPU. 2. I can certainly trying tuning with `intel_graphics`. Thanks for the suggestion! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
