[ 
https://issues.apache.org/jira/browse/SYSTEMML-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16091976#comment-16091976
 ] 

Matthias Boehm edited comment on SYSTEMML-1774 at 7/18/17 6:43 PM:
-------------------------------------------------------------------

ad 2) Forced spark execution mode together with parfor REMOTE_SPARK are invalid 
because it would require to run all operations as distributed spark operations 
as well as the surrounding parfor as a distributed spark operation. It is 
invalid because there are no nested spark/mapreduce operations (i.e., RDD 
operations that call another RDD operation) since this could lead to deadlocks. 
By specifying spark and (and thus forcing local parfor) you effectively run 
multiple concurrent distributed operations on the cluster which leads to full 
cluster utilization on small data.


was (Author: mboehm7):
ad 2) Forced spark execution mode together with parfor REMOTE_SPARK are invalid 
because it would require to run all operations as distributed spark operations 
as well as the surrounding parfor as a distributed spark operation. It is 
invalid because there are no nested spark/mapreduce operations (i.e., RDD 
operations that calls another RDD operation) since this could lead to 
deadlocks. By specifying spark and (and thus forcing local parfor) you 
effectively run multiple concurrent distributed operations on the cluster which 
leads to full cluster utilization on small data.

> Improve Parfor parallelism for deep learning
> --------------------------------------------
>
>                 Key: SYSTEMML-1774
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1774
>             Project: SystemML
>          Issue Type: Improvement
>          Components: Algorithms, Compiler, ParFor
>    Affects Versions: SystemML 1.0
>            Reporter: Fei Hu
>              Labels: deeplearning
>         Attachments: Explain_For_HYBRID_SPARK_Mode_With_ErrorInfo.txt, 
> Explain_For_Spark_Mode.txt, MNIST_Distrib_Sgd.scala, 
> mnist_lenet_distrib_sgd.dml
>
>
> When running the  [distributed MNIST LeNet example | 
> https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
>  each mini-batch could ideally run in parallel without interaction. We try to 
> force {{parfor (j in 1:parallel_batches)}} at line 137 of 
> {{nn/examples/mnist_lenet_distrib_sgd.dml}} to be {{parfor (j in 
> 1:parallel_batches, mode=REMOTE_SPARK, opt=CONSTRAINED)}} use 
> {{REMOTE_SPARK}} mode, but got some errors about 
> {{org.apache.sysml.runtime.DMLRuntimeException: Not supported: Instructions 
> of type other than CP instructions}} using the mode {{SPARK}}, and the error 
> {{java.lang.NullPointerException}} using the mode {{HYBRID_SPARK}}. More log 
> information can be found at the following comments. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to