[ 
https://issues.apache.org/jira/browse/SYSTEMML-2200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm closed SYSTEMML-2200.
------------------------------------
       Resolution: Fixed
         Assignee: Matthias Boehm
    Fix Version/s: SystemML 1.1

> KMeans w/ codegen shows very bad performance
> --------------------------------------------
>
>                 Key: SYSTEMML-2200
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-2200
>             Project: SystemML
>          Issue Type: Sub-task
>            Reporter: Matthias Boehm
>            Assignee: Matthias Boehm
>            Priority: Major
>             Fix For: SystemML 1.1
>
>
> While codegen worked extremely well for KMeans with 1 run, we currently see 
> performance issues in a parfor setting with concurrent 10 runs, which all 
> spawn distributed spark operations. In detail, this is due to particular plan 
> choices that are affected by the reduced local memory budget per parfor 
> worker. However, these issues can be overcome by avoiding unnecessary RDD 
> joins in distributed codegen operations via better broadcast handling 
> (currently the first input is always assumed to be an RDD).
> {code}
> Total elapsed time:           9305.981 sec.
> Total compilation time:               3.023 sec.
> Total execution time:         9302.958 sec.
> Number of compiled Spark inst:        21.
> Number of executed Spark inst:        193.
> Cache hits (Mem, WB, FS, HDFS):       1242/0/0/91.
> Cache writes (WB, FS, HDFS):  456/188/1.
> Cache times (ACQr/m, RLS, EXP):       10086.631/0.011/114.967/1.291 sec.
> HOP DAGs recompiled (PRED, SB):       0/108.
> HOP DAGs recompile time:      2.733 sec.
> Functions recompiled:         1.
> Functions recompile time:     0.043 sec.
> Codegen compile (DAG,CP,JC):  176/430/21.
> Codegen enum (ALLt/p,EVALt/p):        48076/47974/39249/38324.
> Codegen compile times (DAG,JC):       3.024/0.491 sec.
> Codegen enum plan cache hits: 0/0.
> Codegen op plan cache hits:   395/416.
> Spark ctx create time (lazy): 19.506 sec.
> Spark trans counts (par,bc,col):0/179/91.
> Spark trans times (par,bc,col):       0.000/1.954/10086.614 secs.
> ParFor loops optimized:               1.
> ParFor optimize time:         0.141 sec.
> ParFor initialize time:               0.022 sec.
> ParFor result merge time:     0.059 sec.
> ParFor total update in-place: 0/40/50
> Total JIT compile time:               98.963 sec.
> Total JVM GC count:           374.
> Total JVM GC time:            72.456 sec.
> Heavy hitter instructions:
>   #  Instruction          Time(s)  Count
>   1  sp_spoofRATMP63   73,750.553     89
>   2  spoofRATMP43      10,195.724     89
>   3  sp_chkpoint           20.239     12
>   4  sp_uasqk+             14.347      1
>   5  spoofRATMP52          10.496     89
>   6  ba+*                   9.273     15
>   7  sp_mapmm               1.543      1
>   8  write                  1.291      1
>   9  /                      1.127     92
>  10  sp_spoofRATMP116       0.930     89
> {code}
> An initial prototype to avoid unnecessary shuffle improved performance from 
> 9305 to 1607s, but additional improvements are possible. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to