[jira] [Updated] (SYSTEMML-1009) Avoid spark context creation on parfor optimization

Matthias Boehm (JIRA) Tue, 04 Oct 2016 13:58:39 -0700

     [ 
https://issues.apache.org/jira/browse/SYSTEMML-1009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Matthias Boehm updated SYSTEMML-1009:
-------------------------------------
    Description: 
Currently, every parfor script triggers the lazy spark context creation, 
independent of its input data size and script in order to obtain memory budgets 
and parallelism. On small data the the spark context creation dominates 
end-to-end execution time. We should improve this to a configuration-only 
analysis, which would avoid the context creation.

For example, here are the XS and S performance results for univariate 
statistics:
{code}
UnivariateStatistics on mbperftest/bivar/A_10k/data: 14
UnivariateStatistics on mbperftest/bivar/A_10k/data: 14
UnivariateStatistics on mbperftest/bivar/A_10k/data: 17
UnivariateStatistics on mbperftest/bivar/A_10k/data: 16

UnivariateStatistics on mbperftest/bivar/A_100k/data: 14
UnivariateStatistics on mbperftest/bivar/A_100k/data: 15
UnivariateStatistics on mbperftest/bivar/A_100k/data: 14
UnivariateStatistics on mbperftest/bivar/A_100k/data: 17
{code}

  was:Currently, every parfor script triggers the lazy spark context creation, 
independent of its input data size and script in order to obtain memory budgets 
and parallelism. On small data the the spark context creation dominates 
end-to-end execution time. We should improve this to a configuration-only 
analysis, which would avoid the context creation.


> Avoid spark context creation on parfor optimization
> ---------------------------------------------------
>
>                 Key: SYSTEMML-1009
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1009
>             Project: SystemML
>          Issue Type: Improvement
>            Reporter: Matthias Boehm
>
> Currently, every parfor script triggers the lazy spark context creation, 
> independent of its input data size and script in order to obtain memory 
> budgets and parallelism. On small data the the spark context creation 
> dominates end-to-end execution time. We should improve this to a 
> configuration-only analysis, which would avoid the context creation.
> For example, here are the XS and S performance results for univariate 
> statistics:
> {code}
> UnivariateStatistics on mbperftest/bivar/A_10k/data: 14
> UnivariateStatistics on mbperftest/bivar/A_10k/data: 14
> UnivariateStatistics on mbperftest/bivar/A_10k/data: 17
> UnivariateStatistics on mbperftest/bivar/A_10k/data: 16
> UnivariateStatistics on mbperftest/bivar/A_100k/data: 14
> UnivariateStatistics on mbperftest/bivar/A_100k/data: 15
> UnivariateStatistics on mbperftest/bivar/A_100k/data: 14
> UnivariateStatistics on mbperftest/bivar/A_100k/data: 17
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (SYSTEMML-1009) Avoid spark context creation on parfor optimization

Reply via email to