Re: [Laszlo-dev] pre-RFC: target-specific code

Oliver Steele Fri, 02 Dec 2005 08:12:35 -0800

On Dec 2, 2005, at 8:51 AM, Asger Alstrup wrote:

[Using top stack element as alternative to register]

That's interesting to know. That is an optimization the compiler does not attempt yet.

It is very hard to do this optimisation automatically - especially since you have no way to access elements further down the stack. All you have is access to the top element, and swap.

And since most stuff requires using the stack, it might be very costly to promote the wrong variable to the top-stack element. In particular, it will be pretty difficult to implement a robust optimisation that does this automatically.

My suggestion would be an annotation: This variable should be allocated as the top-stack element. Alternatively, provide a specially named variable, which is the top-stack element. But maybe inline assembly is to prefer to such extensions.

Getting a bit off-topic, but...doing this automatically is a pretty interesting area, that I did some work on when I first wrote the script compiler. The context then was that there were only four registers and two of them were allocated as temporaries for specific constructs, so it would have been beneficial to leave values on the stack since the register pressure was so high and GetVariable and SetVariable are so expensive. I had thought that this (unimplemented) optimization was no longer useful as of swf9 and its 256 registers, but apparently I was wrong.

Here's how far I got:

Consider something like "a = b+1; c=a+2". This naively compiles into:

PUSH 1, r:b

ADD

SetRegister r:a

PUSH 2, r:a

ADD

SetRegister r:c

(where r:a stands for r:<n>, where <n> is the register number allocated to the variable 'a').

It could compile to:

PUSH 1, r:b

ADD

DUP

SetRegister r:a

PUSH 2

ADD

SetRegister r:c

In fact, the compiler could rewrite this particular example into "c= (a = b+1) + 2" without touching the back end. Or if b were dead at that point, propogation and constant folding could rewrite it to "c = a + 3". I'm using this example to stand for more complex cases that could only be handled as a back end optimization, though, such as "a = b + 1; c = a + 2; d = a + 3;", where you want to DUP the evaluation of "b+1" twice before storing it in a, or once if you've transformed this into "c = (a = b + 1) + 2; d = a + 3".

The strategy I tried was to draw paths on the _expression_ forest, from the computation of a value to its uses. The compiler then needs to pick the maximal set of non-intersecting paths. I'm not going to try drawing a tree here, but let [h1] stand for the head of path #1 and [t1] stand for a tail. Two of the expressions above can be annotated with these paths:

a = (b+1)[h1; c=a[t1]+2

b = (a = (b + 1)[h1]) + 2; d = a[t1] + 3

Since there's only one path, a DUP can be inserted at its head and the _expression_ that its tail annotates can be dropped, since the value is on the stack at that point.

There are a number of conditions on when a path can be used:

- It can't span a potential side effect to the referenced variable (such as a function call, if the variable is non-local or the call might be to a nested function within the local scope or to a function that recursively calls such a function)

- If it spans more than one block, all the control flows from the block that contains the head have to reach the block that contains the tail.

- It can't cover a stack value that is used within its span. In practice, this means that:

-- The head has to come from the right fringe of a tree, and the tail has to be on the left fringe. Where the branches of a tree are free of relative side effects and the node is an operation that commutes, the tree can be reordered to create this property, though.

-- The path can't intersect another path. [h1, h2, t2, t1] is okay, but [h1, h2, t1, t2] is not.

In order to find a maximal set of non-intersecting paths, you can ignore the last restriction and compute all the paths and then choose the maximal set from this, or use dynamic programming to do both these steps in one pass. A dynamic programming strategy works best for the fringe restriction too, because you might want one ordering to use one path and a different ordering to use another path --- I represented some nodes as unordered, with path restrictions on the order, and then the problem became one of collecting the maximal set of nonoverlapping paths that contained consistent node ordering restrictions.

The third _expression_ above has these annotations:

a = (b + 1)[h1][h2]; c = a[t1] + 2; d = a[t2] + 3;

a = (b + 1)[h1][h2]; c = a[t2] + 2; d = a[t1] + 3;

The first has the path trace [h1,h2,t1,t2], which contains intersecting paths [h1,t1] and [h2,t2]. It's not valid --- the value laid down by h1 would be picked up by t2, and vice versa. The second has the path trace [h1,h2,t2,t1], where [h2,t2] is nested inside [h1,t1]. This is valid. (In this particular case, both of these would generate the same instruction sequence, and it would work because the h1 and h2 values are the same.)

Anyway, I prototyped some of this in Haskell, but I ran out of time to implement it and we did inline assembly instead :-)

Also, I wasn't able to find any literature on this kind of optimization. Everything I could find on optimizing for stack machines assumed that there were a large number of registers, and that they were at least as fast as stack references.

[And finally --- way, way off topic -- the intersecting path condition is the same as the Path Containment Condition in linguistics, maybe because a deterministic stack machine interpreter automatically implements the Active Filler Strategy. Which I only mention because the linguistic analogy is what got me thinking along these lines.]

---

By the way, the compiler does implement a different optimization, that we call "push consolidation". Consider "a = b+1; c = b + 2;". This naively compiles into:

PUSH 1, r:b

ADD

SetRegister r:a

PUSH 2, r:b

ADD

SetRegister r:c

Push consolidation reduces the number of PUSH instructions:

PUSH 2, r:b, 1, r:b

ADD

SetRegister r:a

ADD

SetRegister r:c

_______________________________________________
Laszlo-dev mailing list
[email protected]
http://www.openlaszlo.org/mailman/listinfo/laszlo-dev

Re: [Laszlo-dev] pre-RFC: target-specific code

Reply via email to