[ 
https://issues.apache.org/jira/browse/SYSTEMML-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Janardhan updated SYSTEMML-1140:
--------------------------------
    Description: 
We have identified two performance bugs that frequently occurs in deep learning 
script.

First, we repeatedly perform unnecessary conversion to sparse format. Also, the 
operations such as matrix multiplication (including BLAS and CuBLAS) are 
optimized for dense.

Second, even with large memory budget, we sometimes spend almost 20-30% time in 
caching.

[~mboehm7] [~reinwald] [~mwdus...@us.ibm.com] I am labeling this bug as blocker 
for SystemML 1.0. Please feel free to assign this issue to yourself.

*Improvements so far:*

1. Disabled sparse conversions & caching,  by 
[commit|https://github.com/apache/systemml/commit/caaaec90b61e529e50021d89f9f108230fa307a8]

2. binary sparse-dense mult/div, preallocation by [commit 
|https://github.com/apache/systemml/commit/4f86485939d4777d2799a697b2cbc23ea93ee7e4]

3. For `conv_2d_bias_add`, the `elementWiseInPlaceTransposedAddition` method - 
first, aggreates partial blocks w/o transpose. secondly, does a cache conscious 
transpose to output. by 
[commit|https://github.com/apache/systemml/commit/de1e119de0b2fc2a6c6a2c57bf64c4172a26890d]

4. serialization overhead of sparse matrices(in MCSR) on bufferpool write, by 
using inMemorySize of cache block. by [commit 
|https://github.com/apache/systemml/commit/a68648ded00b0dc2510cd16ae8a0e5fa7ae822c3]

5. removeEmpty(rows) or order perfomance improved by , shallow copy of sparse 
rows, exploiting the fact that removeEmpty(rows) and order do not modify the 
actual sparse rows. by commit

  was:
We have identified two performance bugs that frequently occurs in deep learning 
script.

First, we repeatedly perform unnecessary conversion to sparse format. Also, the 
operations such as matrix multiplication (including BLAS and CuBLAS) are 
optimized for dense.

Second, even with large memory budget, we sometimes spend almost 20-30% time in 
caching.

[~mboehm7] [~reinwald] [~mwdus...@us.ibm.com] I am labeling this bug as blocker 
for SystemML 1.0. Please feel free to assign this issue to yourself.

*Improvements so far:*

1. Disabled sparse conversions & caching,  by 
[commit|https://github.com/apache/systemml/commit/caaaec90b61e529e50021d89f9f108230fa307a8]

2. binary sparse-dense mult/div, preallocation by [commit 
|https://github.com/apache/systemml/commit/4f86485939d4777d2799a697b2cbc23ea93ee7e4]

3. For `conv_2d_bias_add`, the `elementWiseInPlaceTransposedAddition` first 


> Sparse/Caching performance bugs related to deep learning scripts
> ----------------------------------------------------------------
>
>                 Key: SYSTEMML-1140
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1140
>             Project: SystemML
>          Issue Type: Bug
>    Affects Versions: SystemML 1.0.0, SystemML 1.1
>            Reporter: Niketan Pansare
>            Priority: Blocker
>
> We have identified two performance bugs that frequently occurs in deep 
> learning script.
> First, we repeatedly perform unnecessary conversion to sparse format. Also, 
> the operations such as matrix multiplication (including BLAS and CuBLAS) are 
> optimized for dense.
> Second, even with large memory budget, we sometimes spend almost 20-30% time 
> in caching.
> [~mboehm7] [~reinwald] [~mwdus...@us.ibm.com] I am labeling this bug as 
> blocker for SystemML 1.0. Please feel free to assign this issue to yourself.
> *Improvements so far:*
> 1. Disabled sparse conversions & caching,  by 
> [commit|https://github.com/apache/systemml/commit/caaaec90b61e529e50021d89f9f108230fa307a8]
> 2. binary sparse-dense mult/div, preallocation by [commit 
> |https://github.com/apache/systemml/commit/4f86485939d4777d2799a697b2cbc23ea93ee7e4]
> 3. For `conv_2d_bias_add`, the `elementWiseInPlaceTransposedAddition` method 
> - first, aggreates partial blocks w/o transpose. secondly, does a cache 
> conscious transpose to output. by 
> [commit|https://github.com/apache/systemml/commit/de1e119de0b2fc2a6c6a2c57bf64c4172a26890d]
> 4. serialization overhead of sparse matrices(in MCSR) on bufferpool write, by 
> using inMemorySize of cache block. by [commit 
> |https://github.com/apache/systemml/commit/a68648ded00b0dc2510cd16ae8a0e5fa7ae822c3]
> 5. removeEmpty(rows) or order perfomance improved by , shallow copy of sparse 
> rows, exploiting the fact that removeEmpty(rows) and order do not modify the 
> actual sparse rows. by commit



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to