[ 
https://issues.apache.org/jira/browse/SYSTEMML-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16092262#comment-16092262
 ] 

Mike Dusenberry edited comment on SYSTEMML-1774 at 7/18/17 10:08 PM:
---------------------------------------------------------------------

[~mboehm7]  Okay, thanks.  I would have thought that Spark execution mode would 
force the parfor op to be run as a distributed spark operation, and thus the 
bodies would be forced to CP operations running on each worker.  Sounds like it 
is the opposite: the parfor runs as a local CP parfor, and the bodies consist 
of Spark ops if possible.  That should probably be noted somewhere.

Do you have any thoughts as to why the SPARK execution mode is 3x faster than 
the HYBRID_SPARK execution mode?


was (Author: [email protected]):
[~mboehm7]  Okay, thanks.  I would have thought that Spark execution mode would 
force the parfor op to be run as a distributed spark operation, and thus the 
bodies would be forced to CP operations running on each worker.  Sounds like it 
is the opposite: the parfor runs as a local CP parfor, and the bodies consist 
of Spark ops if possible.

Do you have any thoughts as to why the SPARK execution mode is 3x faster than 
the HYBRID_SPARK execution mode?

> Improve Parfor parallelism for deep learning
> --------------------------------------------
>
>                 Key: SYSTEMML-1774
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1774
>             Project: SystemML
>          Issue Type: Improvement
>          Components: Algorithms, Compiler, ParFor
>    Affects Versions: SystemML 1.0
>            Reporter: Fei Hu
>              Labels: deeplearning
>         Attachments: Explain_For_HYBRID_SPARK_Mode_With_ErrorInfo.txt, 
> Explain_For_Spark_Mode.txt, MNIST_Distrib_Sgd.scala, 
> mnist_lenet_distrib_sgd.dml
>
>
> When running the  [distributed MNIST LeNet example | 
> https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
>  each mini-batch could ideally run in parallel without interaction. We try to 
> force {{parfor (j in 1:parallel_batches)}} at line 137 of 
> {{nn/examples/mnist_lenet_distrib_sgd.dml}} to be {{parfor (j in 
> 1:parallel_batches, mode=REMOTE_SPARK, opt=CONSTRAINED)}} use 
> {{REMOTE_SPARK}} mode, but got some errors about 
> {{org.apache.sysml.runtime.DMLRuntimeException: Not supported: Instructions 
> of type other than CP instructions}} using the mode {{SPARK}}, and the error 
> {{java.lang.NullPointerException}} using the mode {{HYBRID_SPARK}}. More log 
> information can be found at the following comments. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to