[ 
https://issues.apache.org/jira/browse/SYSTEMML-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-1350.
--------------------------------------
    Resolution: Done

> Performance parfor spark datapartition-execute
> ----------------------------------------------
>
>                 Key: SYSTEMML-1350
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1350
>             Project: SystemML
>          Issue Type: Sub-task
>          Components: APIs, Runtime
>            Reporter: Matthias Boehm
>            Assignee: Matthias Boehm
>             Fix For: SystemML 1.0
>
>
> Our fused parfor spark datapartition-execute job - as used for large 
> scenarios of univariate statistics - exhibits some unnecessary runtime 
> overheads. In detail, the potential improvements includes:
> 1) Incremental nnz maintenance on partition collect
> 2) Reuse of dense partitions per task (avoid reallocation)
> 3) Explicitly control the number of output partitions (avoid OOMs, reduce 
> memory pressure)
> 4) Avoid unnecessary rdd export on parfor data partitioning
> The points (3) and (4) also apply to the parfor spark datapartition job.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to