Re: SystemML optimizer design

Matthias Boehm Tue, 17 Jan 2017 07:05:31 -0800

Hi Dylan,

these are very interesting questions - let me answer them one by one:

0. SPOOF: We developed the SPOOF compiler framework in a separate forkthat will be integrated back into SystemML master soon. Initially, wewill add the code generation part as an experimental feature, likely inour SystemML 1.0 release. The sum-product part will follow later becauseit's still in a very early stage.

1a. Rewrites: At a high-level, there are two types of rewrites: staticand dynamic. Static rewrites are size-independent while dynamic rewritesdepend on sizes in terms of constraints or costs. During initialcompilation, intra- and inter-procedural analysis only propagates sizesthat are valid over the entire program lifetime. The rewrites are thenindeed applied in an in-place manner (i.e., "destructively"), which isok because sizes are guaranteed not to change. However, during dynamicrecompilation, we use exact sizes and recompile HOP DAGs veryaggressively. In order to allow for non-reversible rewrites, we keep theoriginal HOP DAG, create a deep copy, rewrite the copied HOP DAG andfinally generate LOPs and executable instructions. You'll find thedetails here:

https://github.com/apache/incubator-systemml/blob/master/src/main/java/org/apache/sysml/hops/recompile/Recompiler.java

1b. Rewrite Phase Ordering: Determining the order or rewrites, which isoften called phase ordering in compilers, is currently done manuallywith the context-knowledge of side effects between individual rewrites.This usually works very well in SystemML but gets more complicated as weadd more rewrites and we've already seen a couple of cases were phaseordering problems led to suboptimal plans. As far as I know, theredoesn't exist a principled approach to phase ordering in other compilerslike GCC or LLVM either.

1c. Cost-based Optimization: Right now, different components usedifferent cost functions and heuristics. For example, matrixmultiplication chain optimization uses the number of floating pointoperations, operator selection of distributed matrix multiplicationsuses the I/O and shuffle costs weighted by the degree of parallelism,other decisions use simply the estimated size, and our resourceoptimizer uses a full-fledged time-based cost model regarding generatedruntime plans (seehttps://github.com/apache/incubator-systemml/blob/master/src/main/java/org/apache/sysml/hops/cost/CostEstimatorStaticRuntime.java).For SPOOF, we extended this time-based cost model.

2. Explain: Yes partially, we provide a flag -explain that allowsinvestigating the generated plans at HOP level (-explain hops), atruntime level (-explain runtime), and during dynamic recompilation(-explain recompile_hops, -explain recompile_runtime). However, the HOPexplain already shows the rewritten plans. As workarounds, you can (1)set the optimization level in SystemML-config.xml to 1 in order to seethe initial plans without rewrites, or (2) setProgramRewriter.LDEBUG=true (and rebuild SystemML) to see the appliedrewrites. Furthermore, for task-parallel parfor programs you can addlog=DEBUG in the parfor header to see the the plan before recompilation,after recompilation, and after rewrites along with some details on theindividually applied rewrites.

3. Relationship to Apache Calcite: Well, Calcite is a cost-basedoptimizer for relational algebra. As mentioned in (0), our sum-productoptimization is still in a very early stage. In SystemML master, wepurely focus on linear algebra and statistical functions - hence, thereis not much similarity. However, it is indeed an interesting question tobuild our sum-product optimizer on top of an existing rewrite frameworksuch as Calcite, Spark's Catalyst optimizer, or the Columbia optimizer,etc. So far we tend to build it from scratch as our restricted linearalgebra actually simplifies a couple of rewrites.

I hope this gives a general overview - if you have further questionswith regard to a specific topic, please just ask.



Regards,
Matthias

On 1/17/2017 4:05 AM, Dylan Hutchison wrote:

Hi there,

I learned about SystemML and its optimizer from the recent SPOOF paper
<http://cidrdb.org/cidr2017/papers/p3-elgamal-cidr17.pdf>.  The gist I
absorbed is that SystemML translates linear algebra expressions given by
its DML to relational algebra, then applies standard relational algebra
optimizations, and then re-recognizes the result in linear algebra kernels,
with an attempt to fuse them.

I think I found the SystemML rewrite rules here
<https://github.com/apache/incubator-systemml/tree/master/src/main/java/org/apache/sysml/hops/rewrite>.
A couple questions:

   1. It appears that SystemML rewrites HOP expressions destructively,
   i.e., by throwing away the old expression.  In this case, how does SystemML
   determine the order of rewrites to apply?  Where does cost-based
   optimization come into play?

   2. Is there a way to "debug/visualize" the optimization process?  That
   is, when I start with a DML program, can I view (a) the DML program parsed
   into HOPs; (b) what rules fire and where in the plan, as well as the plan
   after each rule fires; and (c) the lowering and fusing of operators to LOPs?

   I know this is a lot to ask for; I'm curious how far SystemML has gone
   in this direction.

   3. Is there any relationship between the SystemML optimizer and Apache
   Calcite <https://calcite.apache.org/>?  If not, I'd love to understand
   the design decisions that differentiate the two.

Thanks, Dylan Hutchison

Re: SystemML optimizer design

Reply via email to