ZiyueHuang opened a new pull request #8611: optimization for dot(csr.T, dense) = rsp URL: https://github.com/apache/incubator-mxnet/pull/8611 ## Description ## Use prefix sum to compute `nnr` in order to allocate the row_sparse output. Currently `dot(csr.T, dense) = rsp` will allocate the dense output and then cast it to row_sparse, but not free the unused memory. I use `run_benchmark(context, lhs="csr", rhs="default", lhs_trans=True, ...)` in `mxnet/benchmark/python/sparse/dot.py`. Please correct me if I'm wrong. But is `dot(csr.T, dense) = rsp` in master slow like this? Might due to others are using my machine at the same time? Performance of origin `dot(csr.T, dense) = rsp`, ``` [hanfeng@model-gpu00:sparse]$ python dot.py --num-omp-threads 20 ======================================================== mxnet sparse dot benchmark: dot(csr, default) = default (matrix multiplication: (m x k)^T * (k x n) = m x n) ======================================================== lhs_density(%) rhs_density(%) context m k n t_sparse(ms) t_dense(ms) speedup 1.0 100.0 cpu(0) 128 1000000 256 366.19 135.76 0.37 1.0 100.0 cpu(0) 128 1000000 1000 1327.12 503.92 0.38 1.0 100.0 cpu(0) 128 1000000 1000 1237.33 454.01 0.37 1.0 100.0 cpu(0) 64 1000000 1000 868.38 345.38 0.40 1.0 100.0 cpu(0) 128 1000000 1000 1237.09 437.32 0.35 ``` After this PR, ``` [hanfeng@model-gpu00:sparse]$ python dot.py --num-omp-threads 20 ======================================================== mxnet sparse dot benchmark: dot(csr, default) = default (matrix multiplication: (m x k)^T * (k x n) = m x n) ======================================================== lhs_density(%) rhs_density(%) context m k n t_sparse(ms) t_dense(ms) speedup 1.0 100.0 cpu(0) 128 1000000 256 83.90 137.18 1.64 1.0 100.0 cpu(0) 128 1000000 1000 410.63 448.30 1.09 1.0 100.0 cpu(0) 128 1000000 1000 467.91 492.87 1.05 1.0 100.0 cpu(0) 64 1000000 1000 259.99 348.32 1.34 1.0 100.0 cpu(0) 128 1000000 1000 481.77 416.20 0.86 ``` cc @eric-haibin-lin ## Checklist ## ### Essentials ### - [x] Passed code style checking (`make lint`) - [x] Changes are complete (i.e. I finished coding on this PR) - [x] All changes have test coverage - [x] For user-facing API changes, API doc string has been updated. - [x] To my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change ### Changes ### - [x] unittests already exist ## Comments ## - If this change is a backward incompatible change, why must this change be made. - Intersting edge cases to note here
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services