that is a good question. Right now we apply codegen after hop rewrites during both initial compilation and dynamic recompilation. There are existing rewrites that similarly do operator fusion, which could be removed, but we'll leave them in the system for now. These existing fused operators are usually limited to 2 or 3 operators to make them generally applicable, whereas codegen can aggressively compile large scrip-specific operators.

Down the road, we'll aim to extend this framework to handle both automatic rewrites and operator fusion (aka codegen) in a holistic manner. Such an holistic approach would allow reasoning about side effects, where rewrites influence fusion potential and vice versa. However, for now, I'd like to get codegen into production-ready state before making the next step into this direction.


Regards,
Matthias

On 4/20/2017 10:16 AM, dusenberr...@gmail.com wrote:
Excellent, I'll start experimenting with this in our deep learning work.

Question: what is the relationship between codegen and our rewrite rules?

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


On Apr 20, 2017, at 8:32 AM, Berthold Reinwald <reinw...@us.ibm.com> wrote:

This is awesome!

Regards,
Berthold Reinwald
IBM Almaden Research Center
office: (408) 927 2208; T/L: 457 2208
e-mail: reinw...@us.ibm.com



From:   Matthias Boehm <mboe...@googlemail.com>
To:     dev@systemml.incubator.apache.org
Date:   04/20/2017 02:41 AM
Subject:        Experimental code generation



Hi all,

meanwhile our new code generation feature is sufficiently stable to enter
a
broader testing with the goal to further improve its capabilities. If
you're interesting, you can enable this feature via

<codegen.enabled>true</codegen.enabled>

in your SystemML-config.xml file. The major advantages are fewer
intermediates (read and write, incl. potentially fewer evictions), fewer
scans of inputs and intermediates, and better sparsity exploitation across
chains of operations.

On our mainstream algorithms, we see significant improvements compared to
existing fused operators for scenarios with few features, i.e., when the
vector and matrix intermediates become the bottleneck, or scripts with
missing sparsity-exploiting operations. For example, on a 100M x 10 (8GB)
scenario of L2SVM w/ 20 outer iterations, codegen improves performance
from
219s (496s without hand-coded fused operators) to 32s.

So please bring your favorite expressions. If you have interesting
scripts,
please give it a try and share any issues or patterns that we're currently
not handling very well. Thanks.


Regards,
Matthias





Reply via email to