Imran Younus created SYSTEMML-1166:
--------------------------------------

             Summary: Performance issues with tsmm RIGHT
                 Key: SYSTEMML-1166
                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1166
             Project: SystemML
          Issue Type: Bug
          Components: Runtime
         Environment: spark 1.6.1
centos

            Reporter: Imran Younus


I'm running the following simple dml code:

{code}
N = 30000
L = Rand(cols=N, max=1, min=-1, pdf="uniform", rows=N)
LLT = L %*% t(L)
fn = sum(LLT * LLT)
print(fn)
{code}

The line 3 is translated as {{tsmm RIGHT}} and it takes almost 30 min to run 
both in standalone and spark modes.

Here is the output from explain and stats:

{code}
> /opt/spark-1.6.2-bin-hadoop2.6/bin/spark-submit 
> --master=spark://rr-ram4.softlayer.com:7077 --executor-memory=20g 
> --driver-memory=60g --executor-cores=4 --num-executors=20 
> sysml/target/SystemML.jar -f tsmm_verify.dml -explain -stats
17/01/05 18:11:10 INFO api.DMLScript: BEGIN DML run 01/05/2017 18:11:10
17/01/05 18:11:10 INFO conf.DMLConfig: Using internal default configuration 
settings.  If you wish to customize any settings, please supply a 
`SystemML-config.xml` file.
17/01/05 18:11:10 INFO api.DMLScript: EXPLAIN (RUNTIME):
# Memory Budget local/remote = 41216MB/?MB/?MB/?MB
# Degree of Parallelism (vcores) local/remote = 48/?
PROGRAM ( size CP/SP = 10/0 )
--MAIN PROGRAM
----GENERIC (lines 8-24) [recompile=false]
------CP createvar _mVar1 scratch_space//_p22241_10.168.31.117//_t0/temp1 true 
MATRIX binaryblock 30000 30000 1000 1000 900000000 copy
------CP rand 30000 30000 1000 1000 -1 1 1.0 -1 uniform 1.0 48 
_mVar1.MATRIX.DOUBLE
------CP createvar _mVar2 scratch_space//_p22241_10.168.31.117//_t0/temp2 true 
MATRIX binaryblock 30000 30000 1000 1000 -1 copy
------CP tsmm _mVar1.MATRIX.DOUBLE _mVar2.MATRIX.DOUBLE RIGHT 48
------CP rmvar _mVar1
------CP uasqk+ _mVar2.MATRIX.DOUBLE _Var3.SCALAR.DOUBLE 48
------CP rmvar _mVar2
------CP print _Var3.SCALAR.DOUBLE.false _Var4.SCALAR.STRING
------CP rmvar _Var3
------CP rmvar _Var4

17/01/05 18:11:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
6.00000141554168E12
17/01/05 18:40:38 INFO api.DMLScript: SystemML Statistics:
Total elapsed time:                   1768.639 sec.
Total compilation time:                        0.266 sec.
Total execution time:                                1768.373 sec.
Number of compiled Spark inst:                       0.
Number of executed Spark inst:                       0.
Cache hits (Mem, WB, FS, HDFS):                      2/0/0/0.
Cache writes (WB, FS, HDFS):                         2/1/0.
Cache times (ACQr/m, RLS, EXP):                      0.000/0.001/13.410/0.000 
sec.
HOP DAGs recompiled (PRED, SB):                      0/0.
HOP DAGs recompile time:   0.000 sec.
Spark ctx create time (lazy):    0.000 sec.
Spark trans counts (par,bc,col):0/0/0.
Spark trans times (par,bc,col): 0.000/0.000/0.000 secs.
Total JIT compile time:                           1.345 sec.
Total JVM GC count:                                     3.
Total JVM GC time:                                              4.428 sec.
Heavy hitter instructions (name, time, count):
-- 1)        tsmm         1762.014 sec  1
-- 2)        rand         5.210 sec     1
-- 3)        uasqk+       0.541 sec     1
-- 4)        rmvar        0.002 sec     4
-- 5)        print        0.002 sec     1
-- 6)        createvar    0.001 sec     2

17/01/05 18:40:38 INFO api.DMLScript: END DML run 01/05/2017 18:40:38
{code}

If I calculate {{t(L) %*% L}} which is {{tsmm LEFT}}, then it takes only 5 min. 
Here is the output of explain and stats in this case:

{code}
> /opt/spark-1.6.2-bin-hadoop2.6/bin/spark-submit 
> --master=spark://rr-ram4.softlayer.com:7077 --executor-memory=20g 
> --driver-memory=60g --executor-cores=4 --num-executors=20 
> sysml/target/SystemML.jar -f tsmm_verify.dml -explain -stats
17/01/05 18:03:13 INFO api.DMLScript: BEGIN DML run 01/05/2017 18:03:13
17/01/05 18:03:13 INFO conf.DMLConfig: Using internal default configuration 
settings.  If you wish to customize any settings, please supply a 
`SystemML-config.xml` file.
17/01/05 18:03:13 INFO api.DMLScript: EXPLAIN (RUNTIME):
# Memory Budget local/remote = 41216MB/?MB/?MB/?MB
# Degree of Parallelism (vcores) local/remote = 48/?
PROGRAM ( size CP/SP = 10/0 )
--MAIN PROGRAM
----GENERIC (lines 8-24) [recompile=false]
------CP createvar _mVar1 scratch_space//_p19190_10.168.31.117//_t0/temp1 true 
MATRIX binaryblock 30000 30000 1000 1000 900000000 copy
------CP rand 30000 30000 1000 1000 -1 1 1.0 -1 uniform 1.0 48 
_mVar1.MATRIX.DOUBLE
------CP createvar _mVar2 scratch_space//_p19190_10.168.31.117//_t0/temp2 true 
MATRIX binaryblock 30000 30000 1000 1000 -1 copy
------CP tsmm _mVar1.MATRIX.DOUBLE _mVar2.MATRIX.DOUBLE LEFT 48
------CP rmvar _mVar1
------CP uasqk+ _mVar2.MATRIX.DOUBLE _Var3.SCALAR.DOUBLE 48
------CP rmvar _mVar2
------CP print _Var3.SCALAR.DOUBLE.false _Var4.SCALAR.STRING
------CP rmvar _Var3
------CP rmvar _Var4

17/01/05 18:03:13 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
q5.999491296172791E12
17/01/05 18:09:14 INFO api.DMLScript: SystemML Statistics:
Total elapsed time:                   361.293 sec.
Total compilation time:                       0.267 sec.
Total execution time:                               361.026 sec.
Number of compiled Spark inst:                      0.
Number of executed Spark inst:                      0.
Cache hits (Mem, WB, FS, HDFS):                     2/0/0/0.
Cache writes (WB, FS, HDFS):                        2/1/0.
Cache times (ACQr/m, RLS, EXP):                     0.000/0.001/12.827/0.000 
sec.
HOP DAGs recompiled (PRED, SB):                     0/0.
HOP DAGs recompile time:   0.000 sec.
Spark ctx create time (lazy):    0.000 sec.
Spark trans counts (par,bc,col):0/0/0.
Spark trans times (par,bc,col): 0.000/0.000/0.000 secs.
Total JIT compile time:                           1.32 sec.
Total JVM GC count:                                    3.
Total JVM GC time:                                      4.271 sec.
Heavy hitter instructions (name, time, count):
-- 1)        tsmm         354.384 sec  1
-- 2)        rand         5.181 sec    1
-- 3)        uasqk+       0.842 sec    1
-- 4)        rmvar        0.002 sec    4
-- 5)        createvar    0.001 sec    2
-- 6)        print        0.001 sec    1

17/01/05 18:09:14 INFO api.DMLScript: END DML run 01/05/2017 18:09:14
{code}

This time difference for the two cases is way too large. I'm sure there is some 
bug in runtime code that is causing this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to