that is a good question. Right now we apply codegen after hop rewrites
during both initial compilation and dynamic recompilation. There are
existing rewrites that similarly do operator fusion, which could be
removed, but we'll leave them in the system for now. These existing
fused operators are usually limited to 2 or 3 operators to make them
generally applicable, whereas codegen can aggressively compile large
scrip-specific operators.
Down the road, we'll aim to extend this framework to handle both
automatic rewrites and operator fusion (aka codegen) in a holistic
manner. Such an holistic approach would allow reasoning about side
effects, where rewrites influence fusion potential and vice versa.
However, for now, I'd like to get codegen into production-ready state
before making the next step into this direction.
Regards,
Matthias
On 4/20/2017 10:16 AM, dusenberr...@gmail.com wrote:
Excellent, I'll start experimenting with this in our deep learning work.
Question: what is the relationship between codegen and our rewrite rules?
--
Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry
Sent from my iPhone.
On Apr 20, 2017, at 8:32 AM, Berthold Reinwald <reinw...@us.ibm.com> wrote:
This is awesome!
Regards,
Berthold Reinwald
IBM Almaden Research Center
office: (408) 927 2208; T/L: 457 2208
e-mail: reinw...@us.ibm.com
From: Matthias Boehm <mboe...@googlemail.com>
To: dev@systemml.incubator.apache.org
Date: 04/20/2017 02:41 AM
Subject: Experimental code generation
Hi all,
meanwhile our new code generation feature is sufficiently stable to enter
a
broader testing with the goal to further improve its capabilities. If
you're interesting, you can enable this feature via
<codegen.enabled>true</codegen.enabled>
in your SystemML-config.xml file. The major advantages are fewer
intermediates (read and write, incl. potentially fewer evictions), fewer
scans of inputs and intermediates, and better sparsity exploitation across
chains of operations.
On our mainstream algorithms, we see significant improvements compared to
existing fused operators for scenarios with few features, i.e., when the
vector and matrix intermediates become the bottleneck, or scripts with
missing sparsity-exploiting operations. For example, on a 100M x 10 (8GB)
scenario of L2SVM w/ 20 outer iterations, codegen improves performance
from
219s (496s without hand-coded fused operators) to 32s.
So please bring your favorite expressions. If you have interesting
scripts,
please give it a try and share any issues or patterns that we're currently
not handling very well. Thanks.
Regards,
Matthias