Hi Dylan,
these are very interesting questions - let me answer them one by one:
0. SPOOF: We developed the SPOOF compiler framework in a separate fork
that will be integrated back into SystemML master soon. Initially, we
will add the code generation part as an experimental feature, likely in
our SystemML 1.0 release. The sum-product part will follow later because
it's still in a very early stage.
1a. Rewrites: At a high-level, there are two types of rewrites: static
and dynamic. Static rewrites are size-independent while dynamic rewrites
depend on sizes in terms of constraints or costs. During initial
compilation, intra- and inter-procedural analysis only propagates sizes
that are valid over the entire program lifetime. The rewrites are then
indeed applied in an in-place manner (i.e., "destructively"), which is
ok because sizes are guaranteed not to change. However, during dynamic
recompilation, we use exact sizes and recompile HOP DAGs very
aggressively. In order to allow for non-reversible rewrites, we keep the
original HOP DAG, create a deep copy, rewrite the copied HOP DAG and
finally generate LOPs and executable instructions. You'll find the
details here:
https://github.com/apache/incubator-systemml/blob/master/src/main/java/org/apache/sysml/hops/recompile/Recompiler.java
1b. Rewrite Phase Ordering: Determining the order or rewrites, which is
often called phase ordering in compilers, is currently done manually
with the context-knowledge of side effects between individual rewrites.
This usually works very well in SystemML but gets more complicated as we
add more rewrites and we've already seen a couple of cases were phase
ordering problems led to suboptimal plans. As far as I know, there
doesn't exist a principled approach to phase ordering in other compilers
like GCC or LLVM either.
1c. Cost-based Optimization: Right now, different components use
different cost functions and heuristics. For example, matrix
multiplication chain optimization uses the number of floating point
operations, operator selection of distributed matrix multiplications
uses the I/O and shuffle costs weighted by the degree of parallelism,
other decisions use simply the estimated size, and our resource
optimizer uses a full-fledged time-based cost model regarding generated
runtime plans (see
https://github.com/apache/incubator-systemml/blob/master/src/main/java/org/apache/sysml/hops/cost/CostEstimatorStaticRuntime.java).
For SPOOF, we extended this time-based cost model.
2. Explain: Yes partially, we provide a flag -explain that allows
investigating the generated plans at HOP level (-explain hops), at
runtime level (-explain runtime), and during dynamic recompilation
(-explain recompile_hops, -explain recompile_runtime). However, the HOP
explain already shows the rewritten plans. As workarounds, you can (1)
set the optimization level in SystemML-config.xml to 1 in order to see
the initial plans without rewrites, or (2) set
ProgramRewriter.LDEBUG=true (and rebuild SystemML) to see the applied
rewrites. Furthermore, for task-parallel parfor programs you can add
log=DEBUG in the parfor header to see the the plan before recompilation,
after recompilation, and after rewrites along with some details on the
individually applied rewrites.
3. Relationship to Apache Calcite: Well, Calcite is a cost-based
optimizer for relational algebra. As mentioned in (0), our sum-product
optimization is still in a very early stage. In SystemML master, we
purely focus on linear algebra and statistical functions - hence, there
is not much similarity. However, it is indeed an interesting question to
build our sum-product optimizer on top of an existing rewrite framework
such as Calcite, Spark's Catalyst optimizer, or the Columbia optimizer,
etc. So far we tend to build it from scratch as our restricted linear
algebra actually simplifies a couple of rewrites.
I hope this gives a general overview - if you have further questions
with regard to a specific topic, please just ask.
Regards,
Matthias
On 1/17/2017 4:05 AM, Dylan Hutchison wrote:
Hi there,
I learned about SystemML and its optimizer from the recent SPOOF paper
<http://cidrdb.org/cidr2017/papers/p3-elgamal-cidr17.pdf>. The gist I
absorbed is that SystemML translates linear algebra expressions given by
its DML to relational algebra, then applies standard relational algebra
optimizations, and then re-recognizes the result in linear algebra kernels,
with an attempt to fuse them.
I think I found the SystemML rewrite rules here
<https://github.com/apache/incubator-systemml/tree/master/src/main/java/org/apache/sysml/hops/rewrite>.
A couple questions:
1. It appears that SystemML rewrites HOP expressions destructively,
i.e., by throwing away the old expression. In this case, how does SystemML
determine the order of rewrites to apply? Where does cost-based
optimization come into play?
2. Is there a way to "debug/visualize" the optimization process? That
is, when I start with a DML program, can I view (a) the DML program parsed
into HOPs; (b) what rules fire and where in the plan, as well as the plan
after each rule fires; and (c) the lowering and fusing of operators to LOPs?
I know this is a lot to ask for; I'm curious how far SystemML has gone
in this direction.
3. Is there any relationship between the SystemML optimizer and Apache
Calcite <https://calcite.apache.org/>? If not, I'd love to understand
the design decisions that differentiate the two.
Thanks, Dylan Hutchison