ZiyueHuang opened a new pull request #8611: optimization for dot(csr.T, dense) 
= rsp
URL: https://github.com/apache/incubator-mxnet/pull/8611
 
 
   ## Description ##
   
   Use prefix sum to compute `nnr` in order to allocate the row_sparse output.
   
   Currently `dot(csr.T, dense) = rsp` will allocate the dense output and then 
cast it to row_sparse, but not free the unused memory.
   
   I use `run_benchmark(context, lhs="csr", rhs="default", lhs_trans=True, 
...)` in `mxnet/benchmark/python/sparse/dot.py`. Please correct me if I'm wrong.
   
   But is `dot(csr.T, dense) = rsp` in master slow like this? Might due to 
others are using my machine at the same time?
   
   Performance of origin `dot(csr.T, dense) = rsp`,
   
   ```
   [hanfeng@model-gpu00:sparse]$ python dot.py --num-omp-threads 20
   ========================================================
     mxnet sparse dot benchmark: dot(csr, default) = default
     (matrix multiplication: (m x k)^T * (k x n) = m x n)
   ========================================================
    lhs_density(%)  rhs_density(%)    context        m        k        n  
t_sparse(ms)   t_dense(ms)  speedup
               1.0           100.0     cpu(0)      128  1000000      256        
366.19        135.76     0.37
               1.0           100.0     cpu(0)      128  1000000     1000       
1327.12        503.92     0.38
               1.0           100.0     cpu(0)      128  1000000     1000       
1237.33        454.01     0.37
               1.0           100.0     cpu(0)       64  1000000     1000        
868.38        345.38     0.40
               1.0           100.0     cpu(0)      128  1000000     1000       
1237.09        437.32     0.35
   ```
   
   After this PR,
   ```
   [hanfeng@model-gpu00:sparse]$ python dot.py --num-omp-threads 20
   ========================================================
     mxnet sparse dot benchmark: dot(csr, default) = default
     (matrix multiplication: (m x k)^T * (k x n) = m x n)
   ========================================================
    lhs_density(%)  rhs_density(%)    context        m        k        n  
t_sparse(ms)   t_dense(ms)  speedup
               1.0           100.0     cpu(0)      128  1000000      256        
 83.90        137.18     1.64
               1.0           100.0     cpu(0)      128  1000000     1000        
410.63        448.30     1.09
               1.0           100.0     cpu(0)      128  1000000     1000        
467.91        492.87     1.05
               1.0           100.0     cpu(0)       64  1000000     1000        
259.99        348.32     1.34
               1.0           100.0     cpu(0)      128  1000000     1000        
481.77        416.20     0.86
   ```
   cc @eric-haibin-lin 
   
   ## Checklist ##
   ### Essentials ###
   - [x] Passed code style checking (`make lint`)
   - [x] Changes are complete (i.e. I finished coding on this PR)
   - [x] All changes have test coverage
   - [x] For user-facing API changes, API doc string has been updated.
   - [x] To my best knowledge, examples are either not affected by this change, 
or have been fixed to be compatible with this change
   
   ### Changes ###
   - [x] unittests already exist
   
   ## Comments ##
   - If this change is a backward incompatible change, why must this change be 
made.
   - Intersting edge cases to note here
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to