ANSHUMAN87 commented on pull request #6580:
URL: https://github.com/apache/incubator-tvm/pull/6580#issuecomment-700929567


   > I've written a faster sparse_dense for GPUs using tir. This sparse_dense 
requires a padded matrix, so I've added a new op sparse_dense_padded. 
AlterOpLayout should transform sparse_dense to sparse_dense_padded when using a 
gpu.
   > 
   > This new sparse_dense improves prunebert performance from 155.41ms mean to 
7.75ms mean. In general, this implementation is faster than cublas dense on 
matrices with density < 0.05 and is often faster than cusparse for machine 
learning workloads.
   
   @tkonolige : Thanks for the PR! The data looks quite impressive :+1: 
   I was wondering whether we can add some sort of benchmark testcase here , 
tuned to your shared data?
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to