[
https://issues.apache.org/jira/browse/SYSTEMML-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15408671#comment-15408671
]
Mike Dusenberry edited comment on SYSTEMML-845 at 8/5/16 12:27 AM:
-------------------------------------------------------------------
[~niketanpansare] A full run of {{mnist_lenet-train.dml}} in Spark local mode
with 50GB driver memory (and other settings as seen in {{perf.sh}}) had the
following output:
{code}
Total execution time: 8230.825 sec.
Number of executed Spark inst: 137923.
{code}
However, {{lenet-train.dml}} run with the same settings as above ran as follows:
{code}
Total execution time: 2927.089 sec.
Number of executed Spark inst: 4.
{code}
So, these two scripts have the same performance in forced singlenode mode, but
different performance when run with Spark (in local mode), even with an
excessive amount of memory (50GB -- will run in 10GB or less on a laptop).
Awesome chance to make an optimizer improvement for major gains.
cc [~mboehm7]
was (Author: [email protected]):
[~niketanpansare] A full run of {{mnist_lenet-train.dml}} in Spark local mode
with 50GB driver memory (and other settings as seen in {{perf.sh}}) had the
following output:
{code}
Total execution time: 8230.825 sec.
Number of executed Spark inst: 137923.
{code}
However, {{lenet-train.dml}} run with the same settings ran as follows:
{code}
Total execution time: 2927.089 sec.
Number of executed Spark inst: 4.
{code}
So, these two scripts have the same performance in forced singlenode mode, but
different performance when run with Spark (in local mode), even with an
excessive amount of memory (50GB -- will run in 10GB or less on a laptop).
Awesome chance to make an optimizer improvement for major gains.
cc [~mboehm7]
> Compare Performance of LeNet Scripts With & Without Using SystemML-NN
> ---------------------------------------------------------------------
>
> Key: SYSTEMML-845
> URL: https://issues.apache.org/jira/browse/SYSTEMML-845
> Project: SystemML
> Issue Type: Improvement
> Reporter: Mike Dusenberry
> Attachments: convert.dml, lenet-train-spark-explain.log,
> log08.03.16-1470268602.txt, mnist_lenet-train-spark-explain.log, perf.sh,
> run.sh
>
>
> This JIRA issue tracks the comparison of the performance of the LeNet scripts
> with & without using SystemML-NN. The goal is that they should have equal
> performance in terms of both accuracy and time. Any difference will be
> indicate areas of engine improvement.
> Scripts:
> * [mnist_lenet-train.dml |
> https://github.com/apache/incubator-systemml/blob/master/scripts/staging/SystemML-NN/examples/mnist_lenet-train.dml]
> - LeNet script that *does* use the SystemML-NN library.
> * [lenet-train.dml |
> https://github.com/apache/incubator-systemml/blob/master/scripts/staging/lenet-train.dml]
> - LeNet script that *does not* use the SystemML-NN library.
> To fully reproduce, I basically created a directory, placed the two attached
> bash scripts in it, grabbed a copy of the NN library and placed it into the
> directory, ran the examples/get_mnist_data.sh script from the library to get
> the data (placed into examples/data), then used the attached convert.dml to
> create binary copies of the data for both scripts, then ran run.sh. Also, I
> copied examples/data to the base directory as well. Adjust the {{EXEC}} and
> related variables in {{perf.sh}} to switch between standalone, Spark, memory
> sizes, explain, stats, etc.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)