[jira] [Comment Edited] (SYSTEMML-1774) Improve Parfor parallelism for deep learning

Matthias Boehm (JIRA) Tue, 18 Jul 2017 20:19:23 -0700

    [ 
https://issues.apache.org/jira/browse/SYSTEMML-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16092508#comment-16092508
 ]


Matthias Boehm edited comment on SYSTEMML-1774 at 7/19/17 3:18 AM:
-------------------------------------------------------------------

ok after some initial debugging with {{hybrid_spark + parfor}} and driver Xmx 
4g, it seems that the parfor optimizer decided for a parallel degree of 1 
(single-threaded, which caused the slow down) due to the following (unknown) 
memory estimates:
{code}
17/07/18 19:50:15 WARN opt.CostEstimator: Memory estimate larger than budget 
but CP exec type (op=BIAS_ADD, name=26_out, memest=7.635730732E9).
17/07/18 19:50:15 WARN opt.CostEstimator: Memory estimate larger than budget 
but CP exec type (op=MAX_POOLING, name=28_out, memest=7.63573052E9).
17/07/18 19:50:15 WARN opt.CostEstimator: Memory estimate larger than budget 
but CP exec type (op=BIAS_ADD, name=29_out, memest=7.635730988E9).
17/07/18 19:50:15 WARN opt.CostEstimator: Memory estimate larger than budget 
but CP exec type (op=MAX_POOLING, name=31_out, memest=7.63573052E9).
17/07/18 19:50:15 WARN opt.CostEstimator: Memory estimate larger than budget 
but CP exec type (op=MAX_POOLING_BACKWARD, name=42_dX, memest=1.1453595736E10).
17/07/18 19:50:15 WARN opt.CostEstimator: Memory estimate larger than budget 
but CP exec type (op=DIRECT_CONV2D_BACKWARD_DATA, name=45_dX, 
memest=7.636140164E9).
17/07/18 19:50:15 WARN opt.CostEstimator: Memory estimate larger than budget 
but CP exec type (op=DIRECT_CONV2D_BACKWARD_FILTER, name=45_dW, 
memest=7.636140164E9).
17/07/18 19:50:15 WARN opt.CostEstimator: Memory estimate larger than budget 
but CP exec type (op=MAX_POOLING_BACKWARD, name=46_dX, memest=1.1453595736E10).
17/07/18 19:50:15 WARN opt.CostEstimator: Memory estimate larger than budget 
but CP exec type (op=DIRECT_CONV2D_BACKWARD_FILTER, name=48_dW, 
memest=3.819088816E9).
{code}

For more evidence, here is a fragment of the parfor plan with {{hybrid_spark + 
parfor}}

{code}
----------------------------
 EXPLAIN OPT TREE (type=ABSTRACT_PLAN, size=122)
----------------------------
--PARFOR (lines 137-213), exec=CP, k=1, dp=NONE, tp=FACTORING, rm=REMOTE_SPARK
----GENERIC (lines 139-162), exec=CP, k=1
------rix, exec=CP, k=1
------b(+), exec=CP, k=1
------b(%%), exec=CP, k=1
------b(*), exec=CP, k=1
------b(-), exec=CP, k=1
------u(nrow), exec=CP, k=1
------b(min), exec=CP, k=1
------b(-), exec=CP, k=1
------b(+), exec=CP, k=1
------rix, exec=CP, k=1
------BIAS_ADD, exec=CP, k=16
------DIRECT_CONV2D, exec=CP, k=16
{code}

in contrast, the parfor plan with {{spark + parfor}} looks as follows:

{code}
----------------------------
 EXPLAIN OPT TREE (type=ABSTRACT_PLAN, size=122)
----------------------------
--PARFOR (lines 137-213), exec=CP, k=4, dp=NONE, tp=NAIVE, rm=REMOTE_SPARK
----GENERIC (lines 139-162), exec=CP, k=1
------rix, exec=SPARK, k=1
------b(+), exec=SPARK, k=1
------b(%%), exec=SPARK, k=1
------b(*), exec=SPARK, k=1
------b(-), exec=SPARK, k=1
------u(nrow), exec=CP, k=1
------b(min), exec=SPARK, k=1
------b(-), exec=SPARK, k=1
------b(+), exec=SPARK, k=1
------rix, exec=SPARK, k=1
------BIAS_ADD, exec=CP, k=4
------DIRECT_CONV2D, exec=CP, k=4
{code}

Note that the degree of parallelism of 4 is actually incorrect given the 
unknown memory estimates of convolution ops above. This requires some deeper 
analysis.

So the bottom line is, the real issue originates from size propagation issues 
and there are two action items here: (1) address the size propagation issue, 
and (2) fix the bug of potentially incorrect handling of memory estimates for 
convolution ops with forced spark execution mode.


was (Author: mboehm7):
ok after some initial debugging with {{hybrid_spark + parfor}}} and driver Xmx 
4g, it seems that the parfor optimizer decided for a parallel degree of 1 
(single-threaded, which caused the slow down) due to the following (unknown) 
memory estimates:
{code}
17/07/18 19:50:15 WARN opt.CostEstimator: Memory estimate larger than budget 
but CP exec type (op=BIAS_ADD, name=26_out, memest=7.635730732E9).
17/07/18 19:50:15 WARN opt.CostEstimator: Memory estimate larger than budget 
but CP exec type (op=MAX_POOLING, name=28_out, memest=7.63573052E9).
17/07/18 19:50:15 WARN opt.CostEstimator: Memory estimate larger than budget 
but CP exec type (op=BIAS_ADD, name=29_out, memest=7.635730988E9).
17/07/18 19:50:15 WARN opt.CostEstimator: Memory estimate larger than budget 
but CP exec type (op=MAX_POOLING, name=31_out, memest=7.63573052E9).
17/07/18 19:50:15 WARN opt.CostEstimator: Memory estimate larger than budget 
but CP exec type (op=MAX_POOLING_BACKWARD, name=42_dX, memest=1.1453595736E10).
17/07/18 19:50:15 WARN opt.CostEstimator: Memory estimate larger than budget 
but CP exec type (op=DIRECT_CONV2D_BACKWARD_DATA, name=45_dX, 
memest=7.636140164E9).
17/07/18 19:50:15 WARN opt.CostEstimator: Memory estimate larger than budget 
but CP exec type (op=DIRECT_CONV2D_BACKWARD_FILTER, name=45_dW, 
memest=7.636140164E9).
17/07/18 19:50:15 WARN opt.CostEstimator: Memory estimate larger than budget 
but CP exec type (op=MAX_POOLING_BACKWARD, name=46_dX, memest=1.1453595736E10).
17/07/18 19:50:15 WARN opt.CostEstimator: Memory estimate larger than budget 
but CP exec type (op=DIRECT_CONV2D_BACKWARD_FILTER, name=48_dW, 
memest=3.819088816E9).
{code}

For more evidence, here is a fragment of the parfor plan with {{hybrid_spark + 
parfor}}

{code}
----------------------------
 EXPLAIN OPT TREE (type=ABSTRACT_PLAN, size=122)
----------------------------
--PARFOR (lines 137-213), exec=CP, k=1, dp=NONE, tp=FACTORING, rm=REMOTE_SPARK
----GENERIC (lines 139-162), exec=CP, k=1
------rix, exec=CP, k=1
------b(+), exec=CP, k=1
------b(%%), exec=CP, k=1
------b(*), exec=CP, k=1
------b(-), exec=CP, k=1
------u(nrow), exec=CP, k=1
------b(min), exec=CP, k=1
------b(-), exec=CP, k=1
------b(+), exec=CP, k=1
------rix, exec=CP, k=1
------BIAS_ADD, exec=CP, k=16
------DIRECT_CONV2D, exec=CP, k=16
{code}

in contrast, the parfor plan with {{spark + parfor}} looks as follows:

{code}
----------------------------
 EXPLAIN OPT TREE (type=ABSTRACT_PLAN, size=122)
----------------------------
--PARFOR (lines 137-213), exec=CP, k=4, dp=NONE, tp=NAIVE, rm=REMOTE_SPARK
----GENERIC (lines 139-162), exec=CP, k=1
------rix, exec=SPARK, k=1
------b(+), exec=SPARK, k=1
------b(%%), exec=SPARK, k=1
------b(*), exec=SPARK, k=1
------b(-), exec=SPARK, k=1
------u(nrow), exec=CP, k=1
------b(min), exec=SPARK, k=1
------b(-), exec=SPARK, k=1
------b(+), exec=SPARK, k=1
------rix, exec=SPARK, k=1
------BIAS_ADD, exec=CP, k=4
------DIRECT_CONV2D, exec=CP, k=4
{code}

Note that the degree of parallelism of 4 is actually incorrect given the 
unknown memory estimates of convolution ops above. This requires some deeper 
analysis.

So the bottom line is, the real issue originates from size propagation issues 
and there are two action items here: (1) address the size propagation issue, 
and (2) fix the bug of potentially incorrect handling of memory estimates for 
convolution ops with forced spark execution mode.

> Improve Parfor parallelism for deep learning
> --------------------------------------------
>
>                 Key: SYSTEMML-1774
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1774
>             Project: SystemML
>          Issue Type: Improvement
>          Components: Algorithms, Compiler, ParFor
>    Affects Versions: SystemML 1.0
>            Reporter: Fei Hu
>              Labels: deeplearning
>         Attachments: Explain_For_HYBRID_SPARK_Mode_With_ErrorInfo.txt, 
> Explain_For_Spark_Mode.txt, MNIST_Distrib_Sgd.scala, 
> mnist_lenet_distrib_sgd.dml
>
>
> When running the  [distributed MNIST LeNet example | 
> https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
>  each mini-batch could ideally run in parallel without interaction. We try to 
> force {{parfor (j in 1:parallel_batches)}} at line 137 of 
> {{nn/examples/mnist_lenet_distrib_sgd.dml}} to be {{parfor (j in 
> 1:parallel_batches, mode=REMOTE_SPARK, opt=CONSTRAINED)}} use 
> {{REMOTE_SPARK}} mode, but got some errors about 
> {{org.apache.sysml.runtime.DMLRuntimeException: Not supported: Instructions 
> of type other than CP instructions}} using the mode {{SPARK}}, and the error 
> {{java.lang.NullPointerException}} using the mode {{HYBRID_SPARK}}. More log 
> information can be found at the following comments. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (SYSTEMML-1774) Improve Parfor parallelism for deep learning

Reply via email to