[
https://issues.apache.org/jira/browse/SYSTEMML-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthias Boehm closed SYSTEMML-1350.
------------------------------------
> Performance parfor spark datapartition-execute
> ----------------------------------------------
>
> Key: SYSTEMML-1350
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1350
> Project: SystemML
> Issue Type: Sub-task
> Components: APIs, Runtime
> Reporter: Matthias Boehm
> Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> Our fused parfor spark datapartition-execute job - as used for large
> scenarios of univariate statistics - exhibits some unnecessary runtime
> overheads. In detail, the potential improvements includes:
> 1) Incremental nnz maintenance on partition collect
> 2) Reuse of dense partitions per task (avoid reallocation)
> 3) Explicitly control the number of output partitions (avoid OOMs, reduce
> memory pressure)
> 4) Avoid unnecessary rdd export on parfor data partitioning
> The points (3) and (4) also apply to the parfor spark datapartition job.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)