Re: Decaying performance of SystemML

arijit chakraborty Sun, 16 Jul 2017 08:32:25 -0700

Hi Matthias,


I was trying the following code in both R and systemML. The difference in speed 
is huge, in computational term.

R time: 1.837146 mins
SystemML Time: Wall time: 4min 33s

The code I'm working on is very similar to this code. The only difference is 
I'm doing lot more computation within these 2 while-loops.

Can you help me understand why I'm getting this difference. My understanding 
was with larger datasize, the systemML performance should be far better than R 
performance. In smaller datasize their performances are almost the same.

The code has been tested in the same system. The spark configuration is the 
following.


import os
import sys
import pandas as pd
import numpy as np

spark_path = "C:\spark"
os.environ['SPARK_HOME'] = spark_path
os.environ['HADOOP_HOME'] = spark_path

sys.path.append(spark_path + "/bin")
sys.path.append(spark_path + "/python")
sys.path.append(spark_path + "/python/pyspark/")
sys.path.append(spark_path + "/python/lib")
sys.path.append(spark_path + "/python/lib/pyspark.zip")
sys.path.append(spark_path + "/python/lib/py4j-0.10.4-src.zip")

from pyspark import SparkContext
from pyspark import SparkConf

sc = SparkContext("local[*]", "test")


# SystemML Specifications:


from pyspark.sql import SQLContext
import systemml as sml
sqlCtx = SQLContext(sc)
ml = sml.MLContext(sc)



The code we tested:


a = matrix(seq(1, 100000000, 1), 1 , 100000000)

b = 2

break_cond_1 = 0
while(break_cond_1 == 0 ){
  break_cond_2 = 0
  while(break_cond_2 == 0 ){

    ## Checking if atleast 10 numbers are there in the data-points which is even
    c = 0
    for(i in 1:ncol(a)){

      if( i %% 2 == 0){
        c = c + 1
      }

    }
    #c = c + 2
    if( c > 1000){

      break_cond_2 = 1
    }else{

      c = c +  2


    }

  }

  if(break_cond_2 == 1){
    break_cond_1 = 1
  }else{

    c = c + 2
  }



}

Please find some more systemML information below:

SystemML Statistics:
Total elapsed time:             0.000 sec.
Total compilation time:         0.000 sec.
Total execution time:           0.000 sec.
Number of compiled Spark inst:  5.
Number of executed Spark inst:  5.
Cache hits (Mem, WB, FS, HDFS): 3/0/0/0.
Cache writes (WB, FS, HDFS):    6/0/0.
Cache times (ACQr/m, RLS, EXP): 0.000/0.001/0.004/0.000 sec.
HOP DAGs recompiled (PRED, SB): 0/4.
HOP DAGs recompile time:        15.529 sec.
Spark ctx create time (lazy):   0.091 sec.
Spark trans counts (par,bc,col):0/0/0.
Spark trans times (par,bc,col): 0.000/0.000/0.000 secs.
Total JIT compile time:         0.232 sec.
Total JVM GC count:             5467.
Total JVM GC time:              8.237 sec.
Heavy hitter instructions (name, time, count):
-- 1)   %%      33.235 sec      100300000
-- 2)   rmvar   27.762 sec      250750035
-- 3)   ==      26.179 sec      100300017
-- 4)   +       15.555 sec      50150000
-- 5)   assignvar       6.611 sec       50150018
-- 6)   sp_seq  0.675 sec       1
-- 7)   sp_rshape       0.070 sec       1
-- 8)   sp_chkpoint     0.017 sec       3
-- 9)   seq     0.014 sec       3
-- 10)  rshape  0.003 sec     3






Thank you!

Arijit


________________________________
From: arijit chakraborty <[email protected]>
Sent: Wednesday, July 12, 2017 12:21:43 AM
To: [email protected]
Subject: Re: Decaying performance of SystemML

Thank you Matthias! I'll follow your suggestions. Regarding TB, I had this 
confusion that "g" implies 512 mb. That's why I kept around 2TB memory.


Thanks again!

Arijit

________________________________
From: Matthias Boehm <[email protected]>
Sent: Tuesday, July 11, 2017 10:42:58 PM
To: [email protected]
Subject: Re: Decaying performance of SystemML

without any specifics of scripts or datasets, it's unfortunately, hard
if not impossible to help you here. However, note that the memory
configuration seems wrong. Why would you configure the driver and
executors with 2TB if you only have 256GB per node. Maybe you observe an
issue of swapping. Also note that the maxResultSize is irrelevant in
case SystemML creates the spark context because we would anyway set it
to unlimited.

Regarding generally recommend configurations, it's usually a good idea
to use one executor per worker node with the number of cores set to the
number of virtual cores. This allows maximum sharing of broadcasts
across tasks and hence reduces memory pressure.

Regards,
Matthias

On 7/11/2017 9:36 AM, arijit chakraborty wrote:
> Hi,
>
>
> I'm creating a process using systemML. But after certain period of time, the 
> performance decreases.
>
>
> 1) This warning message: WARN TaskSetManager: Stage 25254 contains a task of 
> very large size (3954 KB). The maximum recommended task size is 100 KB.
>
>
> 2) For Spark, we are implementing this setting:
>
>                      spark.executor.memory 2048g
>
>                       spark.driver.memory 2048g
>
>                 spark.driver.maxResultSize 2048
>
> is this good enough, or we can do something else to improve the performance? 
> WE tried the spark implementation suggested in the documentation. But it 
> didn't help much.
>
>
> 3) We are running on a system with 244 gb ram 32 cores and 100 gb hard disk 
> space.
>
>
> it will be great if anyone can guide me how to improve the performance.
>
>
> Thank you!
>
> Arijit
>

Re: Decaying performance of SystemML

Reply via email to