[
https://issues.apache.org/jira/browse/SYSTEMML-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15881916#comment-15881916
]
Matthias Boehm commented on SYSTEMML-1336:
------------------------------------------
With an improved parfor optimizer that now also considers the what-if scenario
of conditional partitioning the runtime improved substantially.
{code}
Total elapsed time: 125.192 sec.
Total compilation time: 2.085 sec.
Total execution time: 123.106 sec.
Number of compiled Spark inst: 1.
Number of executed Spark inst: 1.
Cache hits (Mem, WB, FS, HDFS): 4/0/0/122.
Cache writes (WB, FS, HDFS): 3/0/2.
Cache times (ACQr/m, RLS, EXP): 11.163/0.001/0.004/0.369 sec.
HOP DAGs recompiled (PRED, SB): 0/0.
HOP DAGs recompile time: 0.000 sec.
Spark ctx create time (lazy): 31.174 sec.
Spark trans counts (par,bc,col):0/0/0.
Spark trans times (par,bc,col): 0.000/0.000/0.000 secs.
ParFor loops optimized: 1.
ParFor optimize time: 2.065 sec.
ParFor initialize time: 0.000 sec.
ParFor result merge time: 0.844 sec.
ParFor total update in-place: 0/0/0
Total JIT compile time: 23.855 sec.
Total JVM GC count: 7.
Total JVM GC time: 4.279 sec.
Heavy hitter instructions (name, time, count):
-- 1) ParFor-DPESP 108.240 sec 1
-- 2) uacmax 10.992 sec 1
-- 3) write 0.115 sec 1
-- 4) > 0.087 sec 1
-- 5) rand 0.005 sec 1
-- 6) * 0.002 sec 1
-- 7) rmvar 0.000 sec 8
-- 8) createvar 0.000 sec 6
-- 9) uamax 0.000 sec 1
-- 10) assignvar 0.000 sec 5
{code}
> Improve parfor exec type selection (w/ potential data partitioning)
> -------------------------------------------------------------------
>
> Key: SYSTEMML-1336
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1336
> Project: SystemML
> Issue Type: Sub-task
> Components: Compiler
> Reporter: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> This task aims to address suboptimal parfor optimizer choices for
> partitionable scenarios with large driver memory. Currently, we only apply
> partitioning, if the right indexing operation does not fit in memory of the
> driver or remote tasks. The execution type selection is then unaware of
> potential partitioning, and does not revert this decision - this is
> problematic, because the large input likely exceeds the memory budget of
> remote tasks, ultimately causing the optimizer to fall back to a local parfor
> with very small degree of parallelism k.
> On our perftest 8GB Univariate stats scenario (with 20GB driver, i.e., 14GB
> memory budget), this lead to a local parfor with k=1 and thus, unnecessarily
> high execution time.
> {code}
> Total elapsed time: 781.233 sec.
> Total compilation time: 2.059 sec.
> Total execution time: 779.175 sec.
> Number of compiled Spark inst: 0.
> Number of executed Spark inst: 0.
> Cache hits (Mem, WB, FS, HDFS): 27904/0/0/2.
> Cache writes (WB, FS, HDFS): 3134/0/1.
> Cache times (ACQr/m, RLS, EXP): 9.200/0.022/0.301/0.300 sec.
> HOP DAGs recompiled (PRED, SB): 0/100.
> HOP DAGs recompile time: 0.238 sec.
> Spark ctx create time (lazy): 0.000 sec.
> Spark trans counts (par,bc,col):0/0/0.
> Spark trans times (par,bc,col): 0.000/0.000/0.000 secs.
> ParFor loops optimized: 1.
> ParFor optimize time: 1.985 sec.
> ParFor initialize time: 0.007 sec.
> ParFor result merge time: 0.003 sec.
> ParFor total update in-place: 0/0/13900
> Total JIT compile time: 13.542 sec.
> Total JVM GC count: 29.
> Total JVM GC time: 3.49 sec.
> Heavy hitter instructions (name, time, count):
> -- 1) cm 479.000 sec 2700
> -- 2) qsort 228.928 sec 900
> -- 3) qpick 20.598 sec 1800
> -- 4) rangeReIndex 16.051 sec 2999
> -- 5) uamean 12.867 sec 900
> -- 6) uacmax 9.870 sec 1
> -- 7) ctable 3.158 sec 100
> -- 8) uamin 2.589 sec 1000
> -- 9) uamax 2.560 sec 1101
> -- 10) write 0.300 sec 1
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)