[
https://issues.apache.org/jira/browse/SYSTEMML-2200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthias Boehm closed SYSTEMML-2200.
------------------------------------
Resolution: Fixed
Assignee: Matthias Boehm
Fix Version/s: SystemML 1.1
> KMeans w/ codegen shows very bad performance
> --------------------------------------------
>
> Key: SYSTEMML-2200
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2200
> Project: SystemML
> Issue Type: Sub-task
> Reporter: Matthias Boehm
> Assignee: Matthias Boehm
> Priority: Major
> Fix For: SystemML 1.1
>
>
> While codegen worked extremely well for KMeans with 1 run, we currently see
> performance issues in a parfor setting with concurrent 10 runs, which all
> spawn distributed spark operations. In detail, this is due to particular plan
> choices that are affected by the reduced local memory budget per parfor
> worker. However, these issues can be overcome by avoiding unnecessary RDD
> joins in distributed codegen operations via better broadcast handling
> (currently the first input is always assumed to be an RDD).
> {code}
> Total elapsed time: 9305.981 sec.
> Total compilation time: 3.023 sec.
> Total execution time: 9302.958 sec.
> Number of compiled Spark inst: 21.
> Number of executed Spark inst: 193.
> Cache hits (Mem, WB, FS, HDFS): 1242/0/0/91.
> Cache writes (WB, FS, HDFS): 456/188/1.
> Cache times (ACQr/m, RLS, EXP): 10086.631/0.011/114.967/1.291 sec.
> HOP DAGs recompiled (PRED, SB): 0/108.
> HOP DAGs recompile time: 2.733 sec.
> Functions recompiled: 1.
> Functions recompile time: 0.043 sec.
> Codegen compile (DAG,CP,JC): 176/430/21.
> Codegen enum (ALLt/p,EVALt/p): 48076/47974/39249/38324.
> Codegen compile times (DAG,JC): 3.024/0.491 sec.
> Codegen enum plan cache hits: 0/0.
> Codegen op plan cache hits: 395/416.
> Spark ctx create time (lazy): 19.506 sec.
> Spark trans counts (par,bc,col):0/179/91.
> Spark trans times (par,bc,col): 0.000/1.954/10086.614 secs.
> ParFor loops optimized: 1.
> ParFor optimize time: 0.141 sec.
> ParFor initialize time: 0.022 sec.
> ParFor result merge time: 0.059 sec.
> ParFor total update in-place: 0/40/50
> Total JIT compile time: 98.963 sec.
> Total JVM GC count: 374.
> Total JVM GC time: 72.456 sec.
> Heavy hitter instructions:
> # Instruction Time(s) Count
> 1 sp_spoofRATMP63 73,750.553 89
> 2 spoofRATMP43 10,195.724 89
> 3 sp_chkpoint 20.239 12
> 4 sp_uasqk+ 14.347 1
> 5 spoofRATMP52 10.496 89
> 6 ba+* 9.273 15
> 7 sp_mapmm 1.543 1
> 8 write 1.291 1
> 9 / 1.127 92
> 10 sp_spoofRATMP116 0.930 89
> {code}
> An initial prototype to avoid unnecessary shuffle improved performance from
> 9305 to 1607s, but additional improvements are possible.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)