On Tue, 17 Jun 2008 11:56:00 +0200 Nicolai Hähnle <[EMAIL PROTECTED]> wrote:
> Hey Aapo, > > Am Dienstag 17 Juni 2008 04:07:01 schrieb Aapo Tahkola: > > On Mon, 16 Jun 2008 12:56:39 +0200 > > > Nicolai Hähnle <[EMAIL PROTECTED]> wrote: > > > I want a compiler infrastructure that can do more than one pass > > > over the program that is to be compiled. I also want to be able > > > to do passes that are more complex than a linear walk through > > > instructions while looking at only one instruction at a time. For > > > example, I'm thinking of: > > > - a very simple algorithm for dead code elimination that walks > > > through the program *backwards* > > > - an algorithm to merge MUL and ADD into MAD > > > > I once wrote an algo that had the ability to remove all write masks > > and swizzles of all instructions that do not contribute to the > > results. Simply dropping instructions with no write mask implements > > dead code elimination. > > Did you publish that code somewhere? I was thinking of implementing > the exact same thing some time in the future, but if you already have > something like it that can be adapted... No, didn't release it. I'm not even sure if I still have it(blew couple hard disks few years back). Have to check my desktop hds when I get a chance. > > > Following that, instructions can be divided into two groups: > > -instructions that have fixed output and thus determine which > > components of temporary registers must be fixed > > -instructions where all result components correspond to same > > calculation(mad, xpd, ...) > > > > By properly combining these two you'd get optimal temporary register > > usage. IIRC, the problem I did not solve was how to rearrange > > instructions of two or more distinct calculations that join up > > later in the program so that you'd use minimal amount of temporary > > registers. > > That problem is exactly why getting optimal temporary register usage > *isn't* that simple ;) > > I recall that there's an algorithm based on dynamic programming which > does it. In general though, I have a feeling that we're too often > trying to get a perfect solution in the first cut. I'd much rather > add simple but useful optimizations at first. Yep. It might actually work well enough without reordering. There are/were some cases where fixed pipeline programs used more than 24 temps. Not sure about latest generation but using more temps on r4xx do not slow down fragment programs. -- Aapo Tahkola ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _______________________________________________ Mesa3d-dev mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
