Meteorix commented on pull request #7146: URL: https://github.com/apache/tvm/pull/7146#issuecomment-750787545
> @Meteorix out of curiosity can you share some of your benchmarking results? I'd love to know how much faster this performs than cublas. @jwfromm following are some of the benchmark(tuning 1000 times). This schedule beat cublas on some shapes. That is also why I made `batch_matmul_cublas` autotunable in this pr. ``` Shape: [1, 64, 1024] [1, 4096, 1024] batch_matmul_tensorcore.cuda 2.9238894640234948e-05 batch_matmul_cublas.cuda 2.7487557097865394e-05 batch_matmul.cuda 0.00014189747117647058 Shape: [1, 64, 1024] [1, 1024, 1024] batch_matmul_tensorcore.cuda 1.5578384301061096e-05 batch_matmul_cublas.cuda 2.041829239101948e-05 batch_matmul.cuda 6.108717968157696e-05 Shape: [1, 128, 1024] [1, 4096, 1024] batch_matmul_tensorcore.cuda 0.00011345079327976625 batch_matmul_cublas.cuda 0.00011074180193236715 batch_matmul.cuda 0.00024510443407707913 Shape: [1, 128, 4096] [1, 1024, 4096] batch_matmul_tensorcore.cuda 0.00017083510384959715 batch_matmul_cublas.cuda 0.00010608833085714285 batch_matmul.cuda 0.00035638234315169367 Shape: [16, 128, 64] [16, 128, 64] batch_matmul_cublas.cuda 6.046038943091678e-06 batch_matmul_tensorcore.cuda 4.134768131265665e-06 batch_matmul.cuda 1.2430305571941866e-05 Shape: [16, 128, 128] [16, 64, 128] batch_matmul_tensorcore.cuda 4.74178964860194e-06 batch_matmul_cublas.cuda 9.463372359711623e-06 batch_matmul.cuda 1.4179731404708587e-05 Shape: [1, 128, 1024] [1, 1024, 1024] batch_matmul_tensorcore.cuda 3.857668104222821e-05 batch_matmul_cublas.cuda 2.3704257450575394e-05 batch_matmul.cuda 0.0002515613367983368 ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
