[jira] [Commented] (SYSTEMML-1566) Possible regression from 0.13 -> 0.14 for MNIST LeNet script

Mike Dusenberry (JIRA) Wed, 03 May 2017 14:24:17 -0700

    [ 
https://issues.apache.org/jira/browse/SYSTEMML-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15995706#comment-15995706
 ]


Mike Dusenberry commented on SYSTEMML-1566:
-------------------------------------------

Update:  I've made some significant improvements from SYSTEMML-1554, 
SYSTEMML-1575, SYSTEMML-1561 that have cut the execution time for this script 
in half (1022s -> 493s).  Notice how the number of executed Spark jobs has 
decreased considerably.  SYSTEMML-1561 is still a work in progress, but I have 
a prototype solution.

{code}
17/05/03 14:17:55 INFO DMLScript: SystemML Statistics:
Total elapsed time:             493.003 sec.
Total compilation time:         1.852 sec.
Total execution time:           491.151 sec.
Number of compiled Spark inst:  143.
Number of executed Spark inst:  793.
Cache hits (Mem, WB, FS, HDFS): 156654/0/0/2.
Cache writes (WB, FS, HDFS):    79043/0/8.
Cache times (ACQr/m, RLS, EXP): 3.856/0.052/6.247/1.473 sec.
HOP DAGs recompiled (PRED, SB): 0/5978.
HOP DAGs recompile time:        3.750 sec.
Functions recompiled:           10.
Functions recompile time:       0.087 sec.
Spark ctx create time (lazy):   0.862 sec.
Spark trans counts (par,bc,col):789/789/2.
Spark trans times (par,bc,col): 1.049/0.357/3.771 secs.
Total JIT compile time:         145.35 sec.
Total JVM GC count:             433.
Total JVM GC time:              9.107 sec.
Heavy hitter instructions (name, time, count):
-- 1)   train   450.677 sec     1
-- 2)   conv2d_bias_add         219.343 sec     3298
-- 3)   predict         143.905 sec     9
-- 4)   conv2d_backward_filter  73.291 sec      1720
-- 5)   ba+*    23.381 sec      5949
-- 6)   sel+    20.917 sec      3369
-- 7)   +*      19.232 sec      10320
-- 8)   conv2d_backward_data    16.645 sec      860
-- 9)   sp_mapmm        15.905 sec      789
-- 10)  relu_maxpooling         14.457 sec      3298
{code}

> Possible regression from 0.13 -> 0.14 for MNIST LeNet script
> ------------------------------------------------------------
>
>                 Key: SYSTEMML-1566
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1566
>             Project: SystemML
>          Issue Type: Bug
>            Reporter: Mike Dusenberry
>             Fix For: SystemML 1.0
>
>         Attachments: explain.txt, stats.txt
>
>
> For the 0.14 release testing, I tried out the [MNIST LeNet example | 
> https://github.com/apache/incubator-systemml/blob/master/scripts/nn/examples/mnist_lenet-train.dml]
>  on both 0.13 and 0.14 and noticed a possible regression.  Basically, on 0.14 
> the script took longer to run and had 2513 Spark instructions executed, while 
> on 0.13 only 864 Spark instructions were executed.  This was run locally on a 
> laptop using the 2 instructions at the top of the script (and copied below).  
> I've also attached the stats and runtime explain logs.
> 1. Download data
> {code}
> nn/examples/get_mnist_data.sh
> {code}
> 2. Execute from the {{scripts}} directory.
> {code}
> spark-submit --master local[*] --driver-memory 10G --conf 
> spark.driver.maxResultSize=0 --conf spark.rpc.message.maxSize=128 
> SystemML.jar -f nn/examples/mnist_lenet-train.dml -stats -explain -nvargs 
> train=nn/examples/data/mnist/mnist_train.csv 
> test=nn/examples/data/mnist/mnist_test.csv C=1 Hin=28 Win=28 epochs=1 
> out_dir=nn/examples/model/mnist_lenet
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (SYSTEMML-1566) Possible regression from 0.13 -> 0.14 for MNIST LeNet script

Reply via email to