For those interested, I put up the optimization article on the wiki instead of Wave, see it here: http://code.google.com/p/google-web-toolkit/wiki/AdvancedCompilerOptimizations It surveys and documents some potentially promising areas to look at. My suspicion is that a lot of Java idioms: Builders, Iterators, Command pattern, Wrapper pattern, etc are generating bloat, and that given that Java is an object oriented language that encourages these patterns, the biggest bang for the buck may come from optimizations which can transform, remove, or mitigate their bloat.
-Ray 2009/6/18 Miguel Méndez <[email protected]> > This sounds like a very reasonable approach. Pruning clinits based on the > whole program CFG is definitely interesting, but is there an estimate on the > saving (speed/size) that would be gained from that optimization? I ask, > just as a cross-check. > > On Wed, Jun 17, 2009 at 1:16 PM, Ray Cromwell <[email protected]>wrote: > >> >> Miguel, for what it's worth, I agree wholeheartedly. Object oriented >> programming tends to encourage small block sizes except where inlining helps >> to enlarge them, so inter-block definitely helps alot. Moreover, you can't >> really build a good CFA/DFA without inter-block information propagation. >> Here's an off the cuff proposal for doing this later: >> >> Step 1, as I outlined, is to take JExpressions which are not literals, >> fieldrefs, etc and introduce temporaries so as to introduce a three-address >> style into the AST. >> e.g. >> a + b where a is unary/binary/etc op becomes t = a, t + b >> return expr becomes t=expr; return t >> while(expr) becomes t=expr, while(t) { t=expr; } >> etc >> >> This gives us an AST structure amenable to traditional analysis without >> introducing a totally new IR, we can reuse most of the existing JNode >> classes, only unlike what you'd find in the Dragon Book, we keep high level >> ops for loops, conditionals, method calls, and everything else. >> >> Step 2, introduce a new JPhi node (extends JNode). Build CFG. Add steps to >> compute dominator trees/dominance frontiers and use these to insert JPhi >> nodes into the ASTs. Modify all of the existing visitors to deal with JPhi. >> Add stage to unSSA. >> >> Step 3: Introduce optimizations to this structure. Edge-split form. Faster >> 'optimal' dominance algorithms (I say, use something simpler like the Cooper >> technique at first: >> http://www.hipersoft.rice.edu/grads/publications/dom14.pdf) >> >> The crucial question is whether to do procedure level CFG, or whole >> program CFG. One of the things that interested me about the potential of >> building a limited whole-program CFG was the ability to aggressively (and >> correctly) prune clinits. Now, if the eager clinit hoisting is acceptable, >> than this would be overkill, but if not, then the way to do it is to compute >> the immediate dominator hierarchy for each block. >> >> If block B_j contains clinitA() and it is dominated by some block B_i in >> its immediate dominators which contains a call to clinitA(), then it is safe >> to remove clinitA() from B_j since all possible paths of execution for >> reaching B_j must first go through B_i, guaranteeing the clinitA() would >> have been invoked. >> >> -Ray >> >> 2009/6/17 Miguel Méndez <[email protected]> >> >>> If you are going to tackle the intra-block optimization, you might as >>> well for inter-block opt with block summaries that merge across the control >>> flow. The later is not that much harder since you summarize the intra-block >>> data and propagate. But yes, doing SSA for real is a non-trivial amount of >>> work. >>> >>> >>> On Wed, Jun 17, 2009 at 11:44 AM, Ray Cromwell <[email protected]>wrote: >>> >>>> >>>> I agree. My current proposal really isn't SSA, but poor man's SSA, which >>>> is to simply flatten blocks by introducing temporaries, renumber the >>>> existing variables, perform classic intra-block optimizations, and then >>>> undo. This would atleast result in some modest block level improvements, >>>> including making builder pattern more optimal. The in-block renumbering >>>> procedure I've come up with essentially constructs the liveness range >>>> information as your doing it, without having to deal with the more exotic >>>> SSA algorithms. It might be small enough to perform a useful experiment >>>> without too much work. >>>> Doing real SSA would require quite a bit more work, especially if you >>>> want to deal fully with exceptions and arrays. That to me is something >>>> that's too big of a change to the current compiler infrastructure at the >>>> moment. >>>> >>>> -Ray >>>> >>>> >>>> 2009/6/17 Miguel Méndez <[email protected]> >>>> >>>>> Longer term moving towards an SSA variant has a lot of >>>>> advantages. In the near term, simply experimenting with performing the >>>>> classic intra-procedural (block global) optimizations wouldn't necessarily >>>>> require SSA although SSA does make the bookkeeping easier and faster. If >>>>> you introduced a pass to perform the intra-procedural optimizations that >>>>> would not violate the invariants expected by subsequent optimizations >>>>> (procedure is still in tree form) you'd go a long way. >>>>> >>>>> If you do go with SSA, it's my experience that you should go all the way >>>>> and dead create deal with the phi functions otherwise you end up getting >>>>> a lot more temps than you would otherwise and therefore more "live" refs >>>>> etc. >>>>> >>>>> On Wed, Jun 17, 2009 at 1:12 AM, Ray Cromwell >>>>> <[email protected]>wrote: >>>>> >>>>>> >>>>>> I agree, although I find having a wiki with inline response rather >>>>>> convenient. Here is what I have so far from the Wave, at the end, there >>>>>> is >>>>>> a possible to solution to the problem of optimizing away the builder >>>>>> pattern >>>>>> using general purpose optimizations. >>>>>> -Ray >>>>>> >>>>>> Advanced GWT Compiler Optimizations >>>>>> >>>>>> >>>>>> The purpose of this wave is to talk about a list of future compiler >>>>>> improvements / optimizations for GWT that go beyond the scope of those >>>>>> listed in the current project wiki. >>>>>> >>>>>> >>>>>> Hybrid Intermediate Representations (Tree SSA) >>>>>> >>>>>> Control Flow / Data Flow Analysis >>>>>> >>>>>> Common Subexpression Elimination / Partial Redundancy Elimination >>>>>> >>>>>> Block Level Dead Code Elimination >>>>>> >>>>>> Copy Propagation >>>>>> >>>>>> Register Allocation (temporary/local reduction) >>>>>> >>>>>> Escape Analysis >>>>>> >>>>>> Object inlining / Destructuring >>>>>> >>>>>> Speculative Inlining/Function Cloning >>>>>> >>>>>> Efficient Anonymous Inner Classes >>>>>> >>>>>> Loop Hoisting / Loop Induction >>>>>> >>>>>> >>>>>> >>>>>> Intermediate Representations >>>>>> >>>>>> >>>>>> In order to perform many optimizations, it is easier to deal with a >>>>>> flatter data structure than the traditional AST such as the traditional >>>>>> three address code, but using pure three address code for everything has >>>>>> its >>>>>> own downsides, as well as requiring substantial changes to the compiler. >>>>>> >>>>>> >>>>>> Matt M suggested a hybrid approach of only converting JExpressions to >>>>>> SSA form, This would have the benefit of localizing the changes. It >>>>>> reminded >>>>>> me of GCC's Tree SSA, which upon re-review, looks like it can offer us >>>>>> some >>>>>> useful lessons. >>>>>> >>>>>> >>>>>> As an example, consider the expression x = a + b + c + d in the >>>>>> current AST, this will look something like (assign x (+ d (+ c (+ a >>>>>> b)))). >>>>>> We can flatten this by introducing temporaries, arriving at: >>>>>> >>>>>> >>>>>> t1 = a + b >>>>>> >>>>>> t2 = t1 + c >>>>>> >>>>>> t3 = t2 + d >>>>>> >>>>>> x = t3 >>>>>> >>>>>> >>>>>> In addition, if we are only dealing with simple blocks (no branches), >>>>>> we can use a more simpler kind of SSA, which is simply to rename >>>>>> variables >>>>>> on assignment. >>>>>> >>>>>> >>>>>> Thus, if you have >>>>>> >>>>>> >>>>>> x = 1 >>>>>> >>>>>> y = 2 >>>>>> >>>>>> z = x + y >>>>>> >>>>>> x++ >>>>>> >>>>>> y++ >>>>>> >>>>>> z = x + y >>>>>> >>>>>> >>>>>> you can rename each variable on assignment, yielding the following >>>>>> >>>>>> >>>>>> x1 = 1 >>>>>> >>>>>> y1 = 2 >>>>>> >>>>>> z1 = x1 + y1 >>>>>> >>>>>> x2 = x1 + 1 >>>>>> >>>>>> y2 = y1 + 1 >>>>>> >>>>>> z2 = x2 + y2 >>>>>> >>>>>> >>>>>> This produces an SSA-like form, with each variable defined only once. >>>>>> At first glance, it looks like a de-optimization, but a later pass that >>>>>> does >>>>>> constant folding, copy propagation, and dead elimination will clean this >>>>>> up. >>>>>> As an example, after copy propagation >>>>>> >>>>>> >>>>>> x1 = 1 >>>>>> >>>>>> y1 = 1 >>>>>> >>>>>> z1 = 1 + 2 >>>>>> >>>>>> x2 = 1 + 1 >>>>>> >>>>>> y2 = 2 + 1 >>>>>> >>>>>> z2 = x2 + y2 >>>>>> >>>>>> >>>>>> after constant folding >>>>>> >>>>>> x1 = 1 >>>>>> >>>>>> y1 = 1 >>>>>> >>>>>> z1 = 3 >>>>>> >>>>>> x2 = 2 >>>>>> >>>>>> y2 = 3 >>>>>> >>>>>> z2 = x2 + y2 >>>>>> >>>>>> >>>>>> after DCE (x1/y1/z1 no longer referenced) >>>>>> >>>>>> x2 = 2 >>>>>> >>>>>> y2 = 3 >>>>>> >>>>>> z2 = x2 + y2 >>>>>> >>>>>> >>>>>> after copy prop >>>>>> >>>>>> x2 = 2 >>>>>> >>>>>> y2 = 3 >>>>>> >>>>>> z2 = 2 + 3 >>>>>> >>>>>> >>>>>> after constant fold >>>>>> >>>>>> x2 = 2 >>>>>> >>>>>> y2 = 3 >>>>>> >>>>>> z2 = 5 >>>>>> >>>>>> >>>>>> after DCE >>>>>> >>>>>> z2 = 5 >>>>>> >>>>>> >>>>>> Finally, after renaming registers back to their original names >>>>>> >>>>>> z = 5 >>>>>> >>>>>> >>>>>> A register allocation style pass can further eliminate temporaries >>>>>> later. >>>>>> >>>>>> >>>>>> However, do to proper CFA/DFA, we may want to build a real SSA form, >>>>>> complete with phi functions, so we may do cross-block optimizations. >>>>>> >>>>>> >>>>>> Effects on the Builder Pattern >>>>>> >>>>>> Another recent list discussion concerned optimizing the Builder >>>>>> Pattern. Flattening the AST and using SSA techniques can have a positive >>>>>> effect on this as well. Consider: >>>>>> >>>>>> >>>>>> Builder b = new Builder(); >>>>>> >>>>>> b.setA(10).setB(20).setC(30); >>>>>> >>>>>> >>>>>> After conversion to static methods. >>>>>> >>>>>> setC(setB(setA(b, 10), 20), 30); >>>>>> >>>>>> >>>>>> Flattening/SSA: >>>>>> >>>>>> t1 = setA(b, 10) >>>>>> >>>>>> t2 = setB(t1, 20) >>>>>> >>>>>> t3 = setC(t2, 30) >>>>>> >>>>>> >>>>>> After inlining: >>>>>> >>>>>> b.A = 10; >>>>>> >>>>>> t1 = b; >>>>>> >>>>>> t1.B = 20; >>>>>> >>>>>> t2 = t1 >>>>>> >>>>>> t2.C = 30 >>>>>> >>>>>> t3 = t2 >>>>>> >>>>>> >>>>>> After copy prop: >>>>>> >>>>>> b.A = 10 >>>>>> >>>>>> t1 = b >>>>>> >>>>>> b.B = 20 >>>>>> >>>>>> t2 = b >>>>>> >>>>>> b.C = 30 >>>>>> >>>>>> >>>>>> After dead code elimination: >>>>>> >>>>>> b.A = 10 >>>>>> >>>>>> b.B = 20 >>>>>> >>>>>> b.C = 30 >>>>>> >>>>>> >>>>>> On Tue, Jun 16, 2009 at 6:30 PM, Bruce Johnson<[email protected]> >>>>>> wrote: >>>>>> > Re: Wave access, I was really mostly just being playful, but I'll >>>>>> most >>>>>> > certainly beg and plead to get this group early in the line if at >>>>>> all >>>>>> > possible. >>>>>> > >>>>>> > Meanwhile, I agree with Cameron that we should make sure to discuss >>>>>> the >>>>>> > major ideas here on the mailing list, even if some of the initial >>>>>> specifics >>>>>> > are fleshed out in a wave. I definitely don't want people who don't >>>>>> yet have >>>>>> > Wave accounts to feel penalized just because they weren't able to go >>>>>> to I/O. >>>>>> > (But if you didn't go to I/O this year, I really hope you can next >>>>>> year -- >>>>>> > it's fantastic to meet together in person.) >>>>>> > >>>>>> > On Tue, Jun 16, 2009 at 9:22 PM, Cameron Braid < >>>>>> [email protected]> wrote: >>>>>> >> >>>>>> >> Unfortunately I wasn't able to attend Google I/O, however I did >>>>>> watch a >>>>>> >> few of the sessions online. >>>>>> >> >>>>>> >> I signed up for the sandbox, but it appears that google aren't >>>>>> going to >>>>>> >> grant access to non IO attendees for a little while yet. >>>>>> >> >>>>>> >> I'd appreciate an invite, but I understand if that's not feasible. >>>>>> >> >>>>>> >> Cheers >>>>>> >> >>>>>> >> Cameron >>>>>> >> >>>>>> >> 2009/6/17 Ray Cromwell <[email protected]> >>>>>> >>> >>>>>> >>> Cameron, were you at Google I/O or did you sign up for the >>>>>> sandbox? I can >>>>>> >>> ask a Googler to invite you if not. >>>>>> >>> -Ray >>>>>> >>> >>>>>> >>> On Mon, Jun 15, 2009 at 6:21 PM, Cameron Braid < >>>>>> [email protected]> >>>>>> >>> wrote: >>>>>> >>>> >>>>>> >>>> 2009/6/16 Bruce Johnson <[email protected]> >>>>>> >>>>> >>>>>> >>>>> I'm starting to make a bit o' progress on this. I'll send out a >>>>>> design >>>>>> >>>>> doc "real soon now". >>>>>> >>>>> >>>>>> >>>>> BTW, anyone on the Contributors list here have Wave sandbox >>>>>> accounts? >>>>>> >>>>> Sure would be easier to discuss this in a wave... >>>>>> >>>> >>>>>> >>>> No :( >>>>>> >>>> >>>>>> >>>> But if you are offering invites, send one this way :) Otherwise, >>>>>> its >>>>>> >>>> probably wise to keep the discussion in a place where all >>>>>> interested parties >>>>>> >>>> can participate. >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> Cam >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> > >>>>>> > >>>>>> > >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Miguel >>>>> >>>>> >>>>> >>>> >>>> >>>> >>> >>> >>> -- >>> Miguel >>> >>> >>> >> >> >> > > > -- > Miguel > > > > --~--~---------~--~----~------------~-------~--~----~ http://groups.google.com/group/Google-Web-Toolkit-Contributors -~----------~----~----~----~------~----~------~--~---
