masahi commented on pull request #10332: URL: https://github.com/apache/tvm/pull/10332#issuecomment-1047304225
Ok here is the comparison of GOPS between the new VNNI impl and the existing generic code. Also note that the VNNI numbers were obtained after only 1 or 2 min tuning while the generic ones have very large tuning space and it took more than 12 hours to get these numbers under the same tuning option. B|M|N|K|TVM VNNI| TVM generic (cast to fp32 and does AVX512 FMA) --| -- | -- | -- | --| -- | 8|64|800|320|1862.9816985699251|471.93086647752153 8|64|768|512|1957.1780318372826|254.2322265717467 8|16|256|512|481.7846564891195|249.41214520865546 8|128|128|128|1940.7730023523345|372.7504095880382 8|256|512|256|2380.99163061598|496.7852808609268 8|1024|1024|1024|2275.097320545042|219.50257992579049 8|128|768|3072|1449.8759165025203|219.86756788442386 8|128|768|768|1883.3963380647226|234.35976664468328 8|128|3072|768|1595.616577196681|196.09770614852056 16|384|384|64|2487.792996038378|418.875373840064 16|384|64|384|2441.74586017639|301.37582872146345 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
