Hi Janardhan, You can get instruction-level statistics with the commit https://github.com/apache/systemml/commit/648eb21d66f9cd8727090cdf950986765a7e6ee8 : SystemML Statistics: Total elapsed time: 18.956 sec. Total compilation time: 1.924 sec. Total execution time: 17.032 sec. Number of compiled Spark inst: 3. Number of executed Spark inst: 0. Cache hits (Mem, WB, FS, HDFS): 29/0/0/1. Cache writes (WB, FS, HDFS): 24/0/4. Cache times (ACQr/m, RLS, EXP): 0.201/0.001/0.007/8.379 sec. HOP DAGs recompiled (PRED, SB): 0/1. HOP DAGs recompile time: 0.007 sec. Spark ctx create time (lazy): 0.949 sec. Spark trans counts (par,bc,col):0/0/0. Spark trans times (par,bc,col): 0.000/0.000/0.000 secs. Total JIT compile time: 4.86 sec. Total JVM GC count: 7. Total JVM GC time: 0.192 sec. Heavy hitter instructions: # Instruction Time(s) Count Misc Timers 1 write [PCA.dml 110:8-110:14] 7.628 1 2 eigen [PCA.dml 85:1-85:1] 6.858 1 rlswr[0.000s,2], rlsev [0.000s,0], aqmd[0.000s,2] 3 write [92:12-92:25] 0.689 1 4 ba+* [PCA.dml 110:8-110:14] 0.500 1 rlswr[0.000s,1], aqmd [0.000s,1], aqrd[0.000s,2], rlsev[0.000s,0], rlsi[0.001s,2] 5 tsmm [PCA.dml 81:5-81:16] 0.338 1 rlswr[0.000s,1], rlsev [0.000s,0], rlsi[0.000s,1], aqrd[0.000s,1], aqmd[0.000s,1] 6 uacmean [PCA.dml 66:5-66:5] 0.320 1 rlswr[0.000s,1], rlsev [0.000s,0], aqmd[0.000s,1], rlsi[0.000s,1], aqrd[0.200s,1] 7 uacsqk+ [PCA.dml 70:23-70:23] 0.177 1 rlswr[0.000s,1], rlsev [0.000s,0], aqmd[0.000s,1], aqrd[0.000s,1], rlsi[0.000s,1] 8 ba+* [92:12-92:25] 0.175 1 rlswr[0.000s,1], aqrs [0.000s,1], aqrd[0.000s,1], rlsev[0.000s,0], aqmd[0.000s,1], rlsi[0.000s,2] 9 / [PCA.dml 75:16-75:31] 0.088 1 rlswr[0.000s,1], rlsev [0.000s,0], aqrd[0.000s,2], aqmd[0.000s,1], rlsi[0.000s,2] 10 - [PCA.dml 67:9-67:13] 0.048 1 rlswr[0.000s,1], rlsev [0.000s,0], aqmd[0.000s,1], aqrd[0.000s,2], rlsi[0.000s,2] 11 write [90:11-90:23] 0.044 1 12 uack+ [PCA.dml 80:6-80:6] 0.036 1 rlswr[0.000s,1], rlsev [0.000s,0], aqmd[0.000s,1], aqrd[0.000s,1], rlsi[0.000s,1] 13 uacmean [PCA.dml 72:2-72:2] 0.028 1 rlswr[0.000s,1], rlsev [0.000s,0], aqrd[0.000s,1], aqmd[0.000s,1], rlsi[0.000s,1] 14 -* [PCA.dml 81:5-81:22] 0.026 1 rlswr[0.000s,1], rlsev [0.000s,0], aqmd[0.000s,1], aqrd[0.000s,2], rlsi[0.000s,2] 15 / [PCA.dml 81:5-81:22] 0.019 1 rlswr[0.000s,1], rlsev [0.000s,0], aqmd[0.000s,1], aqrd[0.000s,1], rlsi[0.000s,1] 16 write [102:1-102:1] 0.018 1 17 tsmm [PCA.dml 81:36-81:46] 0.008 1 rlswr[0.000s,1], rlsev [0.000s,0], aqrd[0.000s,1], rlsi[0.000s,1], aqmd[0.000s,1] 18 ctableexpand [88:1-88:1] 0.007 1 rlsev[0.000s,0], rlsi [0.000s,2], aqms[0.000s,1], aqrd[0.000s,2], rlswr[0.002s,1] 19 seq [88:17-88:17] 0.004 1 rlswr[0.000s,1], rlsev [0.000s,0], aqmd[0.000s,1] 20 ba+* [90:11-90:23] 0.003 1 rlswr[0.000s,1], rlsev [0.000s,0], aqrd[0.000s,1], rlsi[0.000s,2], aqmd[0.000s,1], aqrs[0.000s,1] 21 rsort [87:1-87:1] 0.003 1 rlswr[0.000s,1], rlsev [0.000s,0], aqmd[0.000s,1], rlsi[0.000s,1], aqrd[0.000s,1] 22 sqrt [PCA.dml 75:20-75:20] 0.002 1 rlswr[0.000s,1], rlsev [0.000s,0], aqmd[0.000s,1], rlsi[0.000s,1], aqrd[0.000s,1] 23 != 0.001 1 24 rmvar [-1:-1--1:-1] 0.001 22 25 ^2 [PCA.dml 73:25-73:30] 0.001 1 rlswr[0.000s,1], rlsev [0.000s,0], aqmd[0.000s,1], rlsi[0.000s,1], aqrd[0.000s,1] 26 / [PCA.dml 73:14-73:37] 0.001 1 rlswr[0.000s,1], rlsev [0.000s,0], aqmd[0.000s,1], aqrd[0.000s,1], rlsi[0.000s,1] 27 -* [PCA.dml 73:15-73:19] 0.000 1 rlswr[0.000s,1], rlsev [0.000s,0], aqmd[0.000s,1], rlsi[0.000s,2], aqrd[0.000s,2] 28 sqrt [102:1-102:1] 0.000 1 rlswr[0.000s,1], rlsev [0.000s,0], rlsi[0.000s,1], aqrd[0.000s,1], aqmd[0.000s,1] 29 + [104:28-104:34] 0.000 1 30 createvar [90:11-90:23] 0.000 1
With initial glance (so please feel free to correct me if I am wrong), Heavy hitter number 5 corresponds to the expression (t(A) %*% A). Heavy hitter number 17 corresponds to the expression t(mu) %*% mu. Heavy hitter number 17 corresponds to the expression (output of instruction 5) / scalar and so on ... As an FYI, here are the steps I followed wget https://raw.githubusercontent.com/apache/systemml/master/scripts/algorithms/PCA.dml wget https://raw.githubusercontent.com/apache/systemml/master/scripts/datagen/genRandData4PCA.dml wget https://raw.githubusercontent.com/apache/systemml/master/conf/SystemML-config.xml.template mv SystemML-config.xml.template SystemML-config.xml # Set systemml.stats.finegrained to true # Make sure you do a git pull to get the commit https://github.com/apache/systemml/commit/648eb21d66f9cd8727090cdf950986765a7e6ee8 ~/spark-2.1.0-bin-hadoop2.7/bin/spark-submit --driver-memory 10g SystemML.jar -f genRandData4PCA.dml -nvargs R=10000 C=1000 F=binary OUT=pcaData.mtx ~/spark-2.1.0-bin-hadoop2.7/bin/spark-submit --driver-memory 10g SystemML.jar -f PCA.dml -stats 30 -nvargs INPUT=pcaData.mtx OUTPUT=pca-1000x1000-model PROJDATA=1 CENTER=1 SCALE=1 Thanks, Niketan Pansare IBM Almaden Research Center E-mail: npansar At us.ibm.com http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar From: Janardhan Pulivarthi <janardhan.pulivar...@gmail.com> To: dev@systemml.apache.org Date: 07/21/2017 08:57 AM Subject: about performance statistics of PCA.dml Hi Mike, I'd like to know how much expensive this critical code is C = (t(A) %*% A)/(N-1) - (N/(N-1))*t(mu) %*% mu; (at https://github.com/apache/systemml/blob/master/scripts/algorithms/PCA.dml#L81 ) in the SPARK setting given 1. 60Kx700 input for A 2. For a datasize of 28 MB with 100 continuous variable and 1 column with numeric label variable with reference to this comment.( https://issues.apache.org/jira/browse/SYSTEMML-831?focusedCommentId=15525147&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15525147 ) Thank you, Janardhan