[jira] [Updated] (SYSTEMML-1350) Performance parfor spark datapartition-execute

Matthias Boehm (JIRA) Fri, 24 Feb 2017 17:54:06 -0800

     [ 
https://issues.apache.org/jira/browse/SYSTEMML-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Matthias Boehm updated SYSTEMML-1350:
-------------------------------------
    Description: 
Our fused parfor spark datapartition-execute job - as used for large scenarios 
of univariate statistics - exhibits some unnecessary runtime overheads. In 
detail, the potential improvements includes:

1) Incremental nnz maintenance on partition collect
2) Reuse of dense partitions per task (avoid reallocation)
3) Explicitly control the number of output partitions (avoid OOMs, reduce 
memory pressure)
4) Avoid unnecessary rdd export on parfor data partitioning

The points (3) and (4) also apply to the parfor spark datapartition job.

  was:
Our fused parfor spark datapartition-execute job - as used for large scenarios 
of univariate statistics - exhibits some unnecessary runtime overheads. In 
detail, the potential improvements includes:

1) Incremental nnz maintenance on partition collect
2) Reuse of dense partitions per task (avoid reallocation)
3) Explicitly control the number of output partitions (avoid OOMs, reduce 
memory pressure)

The last point also applies to the parfor spark datapartition job.


> Performance parfor spark datapartition-execute
> ----------------------------------------------
>
>                 Key: SYSTEMML-1350
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1350
>             Project: SystemML
>          Issue Type: Sub-task
>          Components: APIs, Runtime
>            Reporter: Matthias Boehm
>            Assignee: Matthias Boehm
>             Fix For: SystemML 1.0
>
>
> Our fused parfor spark datapartition-execute job - as used for large 
> scenarios of univariate statistics - exhibits some unnecessary runtime 
> overheads. In detail, the potential improvements includes:
> 1) Incremental nnz maintenance on partition collect
> 2) Reuse of dense partitions per task (avoid reallocation)
> 3) Explicitly control the number of output partitions (avoid OOMs, reduce 
> memory pressure)
> 4) Avoid unnecessary rdd export on parfor data partitioning
> The points (3) and (4) also apply to the parfor spark datapartition job.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (SYSTEMML-1350) Performance parfor spark datapartition-execute

Reply via email to