[ 
https://issues.apache.org/jira/browse/SYSTEMML-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-1313:
-------------------------------------
    Description: 
The parfor optimizer may decide to execute the entire loop as a remote Spark 
job to utilize cluster parallelism. In this case all inputs to the parfor body 
(i.e., variable that are created or read outside of the parfor body but used or 
overwritten inside) are read from HDFS. In the past there was an issue of 
redundant reads, which has been addressed with SYSTEMML-1879. However, the 
direct use of Spark broadcast variables would likely improve performance, 
especially in clusters with many nodes.

This task aims to leverage Spark broadcast variables for all parfor inputs. In 
detail this entails two major aspects. First, we need runtime support to 
optionally broadcast the inputs via broadcast variables in 
{{RemoteParForSpark}} and obtain them from these broadcast variables in 
{{RemoteParForSparkWorker}} without causing unnecessary eviction. In contrast, 
to the existing broadcast primitives, we don't need to blockify the matrix 
because the matrix is accessed in full by in-memory operations. Second, this 
requires an extension of the parfor optimizer to reason about scenarios where 
it is safe to use broadcast because these broadcasts cause additional memory 
requirements since they act as pinned in memory matrices. This second task has 
likely overlap with SYSTEMML-1349 which requires a similar reasoning to handle 
shared reads.


  was:
The parfor optimizer may decide to execute the entire loop as a remote Spark 
job to utilize cluster parallelism. In this case all inputs to the parfor body 
(i.e., variable that are created or read outside of the parfor body but used or 
overwritten inside) are read from HDFS. In the past there was an issue of 
redundant reads, which has been addressed with SYSTEMML-1879. However, the 
direct use of Spark broadcast variables would likely improve performance, 
especially in clusters with many nodes.

This task aims to leverage Spark broadcast variables for all parfor inputs. In 
detail this entails two major aspects. First, we need runtime support to 
optionally broadcast the inputs via broadcast variables in 
{{RemoteParForSpark}}} and obtain them from these broadcast variables in 
{{RemoteParForSparkWorker}} without causing unnecessary eviction. In contrast, 
to the existing broadcast primitives, we don't need to blockify the matrix 
because the matrix is accessed in full by in-memory operations. Second, this 
requires an extension of the parfor optimizer to reason about scenarios where 
it is safe to use broadcast because these broadcasts cause additional memory 
requirements since they act as pinned in memory matrices. This second task has 
likely overlap with SYSTEMML-.



> Parfor broadcast exploitation
> -----------------------------
>
>                 Key: SYSTEMML-1313
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1313
>             Project: SystemML
>          Issue Type: Sub-task
>          Components: APIs, Runtime
>            Reporter: Matthias Boehm
>            Priority: Major
>
> The parfor optimizer may decide to execute the entire loop as a remote Spark 
> job to utilize cluster parallelism. In this case all inputs to the parfor 
> body (i.e., variable that are created or read outside of the parfor body but 
> used or overwritten inside) are read from HDFS. In the past there was an 
> issue of redundant reads, which has been addressed with SYSTEMML-1879. 
> However, the direct use of Spark broadcast variables would likely improve 
> performance, especially in clusters with many nodes.
> This task aims to leverage Spark broadcast variables for all parfor inputs. 
> In detail this entails two major aspects. First, we need runtime support to 
> optionally broadcast the inputs via broadcast variables in 
> {{RemoteParForSpark}} and obtain them from these broadcast variables in 
> {{RemoteParForSparkWorker}} without causing unnecessary eviction. In 
> contrast, to the existing broadcast primitives, we don't need to blockify the 
> matrix because the matrix is accessed in full by in-memory operations. 
> Second, this requires an extension of the parfor optimizer to reason about 
> scenarios where it is safe to use broadcast because these broadcasts cause 
> additional memory requirements since they act as pinned in memory matrices. 
> This second task has likely overlap with SYSTEMML-1349 which requires a 
> similar reasoning to handle shared reads.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to