[ 
https://issues.apache.org/jira/browse/SYSTEMML-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Hu updated SYSTEMML-1774:
-----------------------------
    Description: When running the  [distributed MNIST LeNet example | 
https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
 each mini-batch could ideally run in parallel without interaction. We try to 
force {{parfor (j in 1:parallel_batches)}} at line 137 of 
{{nn/examples/mnist_lenet_distrib_sgd.dml}} to be {{parfor (j in 
1:parallel_batches, mode=REMOTE_SPARK, opt=CONSTRAINED)}} use {{REMOTE_SPARK}} 
mode, but got some errors about {{org.apache.sysml.runtime.DMLRuntimeException: 
Not supported: Instructions of type other than CP instructions}} on the local 
machine and the error {{java.lang.NullPointerException}} on the Spark cluster. 
More log information can be found at the following comments.   (was: When 
running the  [distributed MNIST LeNet example | 
https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
 each mini-batch could ideally run in parallel without interaction. We try to 
force {{parfor (j in 1:parallel_batches)}} at line 137 of 
{{nn/examples/mnist_lenet_distrib_sgd.dml}} to be {{parfor (j in 
1:parallel_batches, mode=REMOTE_SPARK, opt=CONSTRAINED)}} use {{REMOTE_SPARK}} 
mode, but got some errors about {{org.apache.sysml.runtime.DMLRuntimeException: 
Not supported: Instructions of type other than CP instructions}}. More log 
information can be found at the following comments. One example of the errors 
is that at the convolutional layer, we need to randomly generate some matrixes, 
but SystemML choose {{RandSPInstruction}} instead of {{DataGenCPInstruction}}, 
which may be because SystemML could not determine the row number of the matrix. 
For this distributed MNIST LeNet  example, using CPInstruction may achieve 
better performance. )

> Improve Parfor parallelism for deep learning
> --------------------------------------------
>
>                 Key: SYSTEMML-1774
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1774
>             Project: SystemML
>          Issue Type: Improvement
>          Components: Algorithms, Compiler, ParFor
>    Affects Versions: SystemML 1.0
>            Reporter: Fei Hu
>              Labels: deeplearning
>
> When running the  [distributed MNIST LeNet example | 
> https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
>  each mini-batch could ideally run in parallel without interaction. We try to 
> force {{parfor (j in 1:parallel_batches)}} at line 137 of 
> {{nn/examples/mnist_lenet_distrib_sgd.dml}} to be {{parfor (j in 
> 1:parallel_batches, mode=REMOTE_SPARK, opt=CONSTRAINED)}} use 
> {{REMOTE_SPARK}} mode, but got some errors about 
> {{org.apache.sysml.runtime.DMLRuntimeException: Not supported: Instructions 
> of type other than CP instructions}} on the local machine and the error 
> {{java.lang.NullPointerException}} on the Spark cluster. More log information 
> can be found at the following comments. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to