[
https://issues.apache.org/jira/browse/SYSTEMML-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16445646#comment-16445646
]
LI Guobao commented on SYSTEMML-1313:
-------------------------------------
Hi [~mboehm7], I have introduce the runtime support and I'd like to run a test
forĀ it. And which test can I take? Should I install a yarn cluster so that the
parfor can be executed in mode `remote_spark`?
> Parfor broadcast exploitation
> -----------------------------
>
> Key: SYSTEMML-1313
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1313
> Project: SystemML
> Issue Type: Sub-task
> Components: APIs, Runtime
> Reporter: Matthias Boehm
> Assignee: LI Guobao
> Priority: Major
>
> The parfor optimizer may decide to execute the entire loop as a remote Spark
> job to utilize cluster parallelism. In this case all inputs to the parfor
> body (i.e., variable that are created or read outside of the parfor body but
> used or overwritten inside) are read from HDFS. In the past there was an
> issue of redundant reads, which has been addressed with SYSTEMML-1879.
> However, the direct use of Spark broadcast variables would likely improve
> performance, especially in clusters with many nodes.
> This task aims to leverage Spark broadcast variables for all parfor inputs.
> In detail this entails two major aspects. First, we need runtime support to
> optionally broadcast the inputs via broadcast variables in
> {{RemoteParForSpark}} and obtain them from these broadcast variables in
> {{RemoteParForSparkWorker}} without causing unnecessary eviction. In
> contrast, to the existing broadcast primitives, we don't need to blockify the
> matrix because the matrix is accessed in full by in-memory operations.
> Second, this requires an extension of the parfor optimizer to reason about
> scenarios where it is safe to use broadcast because these broadcasts cause
> additional memory requirements since they act as pinned in memory matrices.
> This second task has likely overlap with SYSTEMML-1349 which requires a
> similar reasoning to handle shared reads.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)