thanks for sharing the skeleton of the script. Here are a couple of suggestions:
1) Impact of fine-grained stats: The provided script executes mostly scalar instructions. In those kinds of scenarios, the time measurement per instruction can be a major performance bottleneck. I just executed this script with and without -stats and got end-to-end execution times of 132s and 54s respectively, which confirms this. 2) Memory budget: You allocate a vector of 100M, i.e., 800MB - the fact that the stats output shows spark instructions means that you're running the driver with very small memory (maybe the default of 1GB?). When comparing with R please ensure that both have the same memory budget. On large data, we would compile distributed operations but of course you only benefit from that if you have a cluster - right now you're running in Spark local mode only. 3) Recompile time: Another thing that looks suspicious to me is the recompilation time of 15.529s for 4 recompilations. Typically, we see <1ms recompilation times per average DAG of 50-100 operators - could it be that there are some setup issues which lazily load classes and libraries? Regards, Matthias On Sun, Jul 16, 2017 at 8:31 AM, arijit chakraborty <ak...@hotmail.com> wrote: > Hi Matthias, > > > I was trying the following code in both R and systemML. The difference in > speed is huge, in computational term. > > R time: 1.837146 mins > SystemML Time: Wall time: 4min 33s > > The code I'm working on is very similar to this code. The only difference > is I'm doing lot more computation within these 2 while-loops. > > Can you help me understand why I'm getting this difference. My > understanding was with larger datasize, the systemML performance should be > far better than R performance. In smaller datasize their performances are > almost the same. > > The code has been tested in the same system. The spark configuration is > the following. > > > import os > import sys > import pandas as pd > import numpy as np > > spark_path = "C:\spark" > os.environ['SPARK_HOME'] = spark_path > os.environ['HADOOP_HOME'] = spark_path > > sys.path.append(spark_path + "/bin") > sys.path.append(spark_path + "/python") > sys.path.append(spark_path + "/python/pyspark/") > sys.path.append(spark_path + "/python/lib") > sys.path.append(spark_path + "/python/lib/pyspark.zip") > sys.path.append(spark_path + "/python/lib/py4j-0.10.4-src.zip") > > from pyspark import SparkContext > from pyspark import SparkConf > > sc = SparkContext("local[*]", "test") > > > # SystemML Specifications: > > > from pyspark.sql import SQLContext > import systemml as sml > sqlCtx = SQLContext(sc) > ml = sml.MLContext(sc) > > > > The code we tested: > > > a = matrix(seq(1, 100000000, 1), 1 , 100000000) > > b = 2 > > break_cond_1 = 0 > while(break_cond_1 == 0 ){ > break_cond_2 = 0 > while(break_cond_2 == 0 ){ > > ## Checking if atleast 10 numbers are there in the data-points which > is even > c = 0 > for(i in 1:ncol(a)){ > > if( i %% 2 == 0){ > c = c + 1 > } > > } > #c = c + 2 > if( c > 1000){ > > break_cond_2 = 1 > }else{ > > c = c + 2 > > > } > > } > > if(break_cond_2 == 1){ > break_cond_1 = 1 > }else{ > > c = c + 2 > } > > > > } > > Please find some more systemML information below: > > SystemML Statistics: > Total elapsed time: 0.000 sec. > Total compilation time: 0.000 sec. > Total execution time: 0.000 sec. > Number of compiled Spark inst: 5. > Number of executed Spark inst: 5. > Cache hits (Mem, WB, FS, HDFS): 3/0/0/0. > Cache writes (WB, FS, HDFS): 6/0/0. > Cache times (ACQr/m, RLS, EXP): 0.000/0.001/0.004/0.000 sec. > HOP DAGs recompiled (PRED, SB): 0/4. > HOP DAGs recompile time: 15.529 sec. > Spark ctx create time (lazy): 0.091 sec. > Spark trans counts (par,bc,col):0/0/0. > Spark trans times (par,bc,col): 0.000/0.000/0.000 secs. > Total JIT compile time: 0.232 sec. > Total JVM GC count: 5467. > Total JVM GC time: 8.237 sec. > Heavy hitter instructions (name, time, count): > -- 1) %% 33.235 sec 100300000 > -- 2) rmvar 27.762 sec 250750035 > -- 3) == 26.179 sec 100300017 > -- 4) + 15.555 sec 50150000 > -- 5) assignvar 6.611 sec 50150018 > -- 6) sp_seq 0.675 sec 1 > -- 7) sp_rshape 0.070 sec 1 > -- 8) sp_chkpoint 0.017 sec 3 > -- 9) seq 0.014 sec 3 > -- 10) rshape 0.003 sec 3 > > > > > > > Thank you! > > Arijit > > > ________________________________ > From: arijit chakraborty <ak...@hotmail.com> > Sent: Wednesday, July 12, 2017 12:21:43 AM > To: dev@systemml.apache.org > Subject: Re: Decaying performance of SystemML > > Thank you Matthias! I'll follow your suggestions. Regarding TB, I had this > confusion that "g" implies 512 mb. That's why I kept around 2TB memory. > > > Thanks again! > > Arijit > > ________________________________ > From: Matthias Boehm <mboe...@googlemail.com> > Sent: Tuesday, July 11, 2017 10:42:58 PM > To: dev@systemml.apache.org > Subject: Re: Decaying performance of SystemML > > without any specifics of scripts or datasets, it's unfortunately, hard > if not impossible to help you here. However, note that the memory > configuration seems wrong. Why would you configure the driver and > executors with 2TB if you only have 256GB per node. Maybe you observe an > issue of swapping. Also note that the maxResultSize is irrelevant in > case SystemML creates the spark context because we would anyway set it > to unlimited. > > Regarding generally recommend configurations, it's usually a good idea > to use one executor per worker node with the number of cores set to the > number of virtual cores. This allows maximum sharing of broadcasts > across tasks and hence reduces memory pressure. > > Regards, > Matthias > > On 7/11/2017 9:36 AM, arijit chakraborty wrote: > > Hi, > > > > > > I'm creating a process using systemML. But after certain period of time, > the performance decreases. > > > > > > 1) This warning message: WARN TaskSetManager: Stage 25254 contains a > task of very large size (3954 KB). The maximum recommended task size is 100 > KB. > > > > > > 2) For Spark, we are implementing this setting: > > > > spark.executor.memory 2048g > > > > spark.driver.memory 2048g > > > > spark.driver.maxResultSize 2048 > > > > is this good enough, or we can do something else to improve the > performance? WE tried the spark implementation suggested in the > documentation. But it didn't help much. > > > > > > 3) We are running on a system with 244 gb ram 32 cores and 100 gb hard > disk space. > > > > > > it will be great if anyone can guide me how to improve the performance. > > > > > > Thank you! > > > > Arijit > > >