Re: Decaying performance of SystemML

Matthias Boehm Sun, 16 Jul 2017 13:41:05 -0700

thanks for sharing the skeleton of the script. Here are a couple of
suggestions:


1) Impact of fine-grained stats: The provided script executes mostly scalar
instructions. In those kinds of scenarios, the time measurement per
instruction can be a major performance bottleneck. I just executed this
script with and without -stats and got end-to-end execution times of 132s
and 54s respectively, which confirms this.

2) Memory budget: You allocate a vector of 100M, i.e., 800MB - the fact
that the stats output shows spark instructions means that you're running
the driver with very small memory (maybe the default of 1GB?). When
comparing with R please ensure that both have the same memory budget. On
large data, we would compile distributed operations but of course you only
benefit from that if you have a cluster - right now you're running in Spark
local mode only.

3) Recompile time: Another thing that looks suspicious to me is the
recompilation time of 15.529s for 4 recompilations. Typically, we see <1ms
recompilation times per average DAG of 50-100 operators - could it be that
there are some setup issues which lazily load classes and libraries?

Regards,
Matthias

On Sun, Jul 16, 2017 at 8:31 AM, arijit chakraborty <ak...@hotmail.com>
wrote:

> Hi Matthias,
>
>
> I was trying the following code in both R and systemML. The difference in
> speed is huge, in computational term.
>
> R time: 1.837146 mins
> SystemML Time: Wall time: 4min 33s
>
> The code I'm working on is very similar to this code. The only difference
> is I'm doing lot more computation within these 2 while-loops.
>
> Can you help me understand why I'm getting this difference. My
> understanding was with larger datasize, the systemML performance should be
> far better than R performance. In smaller datasize their performances are
> almost the same.
>
> The code has been tested in the same system. The spark configuration is
> the following.
>
>
> import os
> import sys
> import pandas as pd
> import numpy as np
>
> spark_path = "C:\spark"
> os.environ['SPARK_HOME'] = spark_path
> os.environ['HADOOP_HOME'] = spark_path
>
> sys.path.append(spark_path + "/bin")
> sys.path.append(spark_path + "/python")
> sys.path.append(spark_path + "/python/pyspark/")
> sys.path.append(spark_path + "/python/lib")
> sys.path.append(spark_path + "/python/lib/pyspark.zip")
> sys.path.append(spark_path + "/python/lib/py4j-0.10.4-src.zip")
>
> from pyspark import SparkContext
> from pyspark import SparkConf
>
> sc = SparkContext("local[*]", "test")
>
>
> # SystemML Specifications:
>
>
> from pyspark.sql import SQLContext
> import systemml as sml
> sqlCtx = SQLContext(sc)
> ml = sml.MLContext(sc)
>
>
>
> The code we tested:
>
>
> a = matrix(seq(1, 100000000, 1), 1 , 100000000)
>
> b = 2
>
> break_cond_1 = 0
> while(break_cond_1 == 0 ){
>   break_cond_2 = 0
>   while(break_cond_2 == 0 ){
>
>     ## Checking if atleast 10 numbers are there in the data-points which
> is even
>     c = 0
>     for(i in 1:ncol(a)){
>
>       if( i %% 2 == 0){
>         c = c + 1
>       }
>
>     }
>     #c = c + 2
>     if( c > 1000){
>
>       break_cond_2 = 1
>     }else{
>
>       c = c +  2
>
>
>     }
>
>   }
>
>   if(break_cond_2 == 1){
>     break_cond_1 = 1
>   }else{
>
>     c = c + 2
>   }
>
>
>
> }
>
> Please find some more systemML information below:
>
> SystemML Statistics:
> Total elapsed time:             0.000 sec.
> Total compilation time:         0.000 sec.
> Total execution time:           0.000 sec.
> Number of compiled Spark inst:  5.
> Number of executed Spark inst:  5.
> Cache hits (Mem, WB, FS, HDFS): 3/0/0/0.
> Cache writes (WB, FS, HDFS):    6/0/0.
> Cache times (ACQr/m, RLS, EXP): 0.000/0.001/0.004/0.000 sec.
> HOP DAGs recompiled (PRED, SB): 0/4.
> HOP DAGs recompile time:        15.529 sec.
> Spark ctx create time (lazy):   0.091 sec.
> Spark trans counts (par,bc,col):0/0/0.
> Spark trans times (par,bc,col): 0.000/0.000/0.000 secs.
> Total JIT compile time:         0.232 sec.
> Total JVM GC count:             5467.
> Total JVM GC time:              8.237 sec.
> Heavy hitter instructions (name, time, count):
> -- 1)   %%      33.235 sec      100300000
> -- 2)   rmvar   27.762 sec      250750035
> -- 3)   ==      26.179 sec      100300017
> -- 4)   +       15.555 sec      50150000
> -- 5)   assignvar       6.611 sec       50150018
> -- 6)   sp_seq  0.675 sec       1
> -- 7)   sp_rshape       0.070 sec       1
> -- 8)   sp_chkpoint     0.017 sec       3
> -- 9)   seq     0.014 sec       3
> -- 10)  rshape  0.003 sec     3
>
>
>
>
>
>
> Thank you!
>
> Arijit
>
>
> ________________________________
> From: arijit chakraborty <ak...@hotmail.com>
> Sent: Wednesday, July 12, 2017 12:21:43 AM
> To: dev@systemml.apache.org
> Subject: Re: Decaying performance of SystemML
>
> Thank you Matthias! I'll follow your suggestions. Regarding TB, I had this
> confusion that "g" implies 512 mb. That's why I kept around 2TB memory.
>
>
> Thanks again!
>
> Arijit
>
> ________________________________
> From: Matthias Boehm <mboe...@googlemail.com>
> Sent: Tuesday, July 11, 2017 10:42:58 PM
> To: dev@systemml.apache.org
> Subject: Re: Decaying performance of SystemML
>
> without any specifics of scripts or datasets, it's unfortunately, hard
> if not impossible to help you here. However, note that the memory
> configuration seems wrong. Why would you configure the driver and
> executors with 2TB if you only have 256GB per node. Maybe you observe an
> issue of swapping. Also note that the maxResultSize is irrelevant in
> case SystemML creates the spark context because we would anyway set it
> to unlimited.
>
> Regarding generally recommend configurations, it's usually a good idea
> to use one executor per worker node with the number of cores set to the
> number of virtual cores. This allows maximum sharing of broadcasts
> across tasks and hence reduces memory pressure.
>
> Regards,
> Matthias
>
> On 7/11/2017 9:36 AM, arijit chakraborty wrote:
> > Hi,
> >
> >
> > I'm creating a process using systemML. But after certain period of time,
> the performance decreases.
> >
> >
> > 1) This warning message: WARN TaskSetManager: Stage 25254 contains a
> task of very large size (3954 KB). The maximum recommended task size is 100
> KB.
> >
> >
> > 2) For Spark, we are implementing this setting:
> >
> >                      spark.executor.memory 2048g
> >
> >                       spark.driver.memory 2048g
> >
> >                 spark.driver.maxResultSize 2048
> >
> > is this good enough, or we can do something else to improve the
> performance? WE tried the spark implementation suggested in the
> documentation. But it didn't help much.
> >
> >
> > 3) We are running on a system with 244 gb ram 32 cores and 100 gb hard
> disk space.
> >
> >
> > it will be great if anyone can guide me how to improve the performance.
> >
> >
> > Thank you!
> >
> > Arijit
> >
>

Re: Decaying performance of SystemML

Reply via email to