On Tue, 17 Jun 2008 11:56:00 +0200
Nicolai Hähnle <[EMAIL PROTECTED]> wrote:

> Hey Aapo,
> 
> Am Dienstag 17 Juni 2008 04:07:01 schrieb Aapo Tahkola:
> > On Mon, 16 Jun 2008 12:56:39 +0200
> > > Nicolai Hähnle <[EMAIL PROTECTED]> wrote: 
> > > I want a compiler infrastructure that can do more than one pass
> > > over the program that is to be compiled. I also want to be able
> > > to do passes that are more complex than a linear walk through
> > > instructions while looking at only one instruction at a time. For
> > > example, I'm thinking of:
> > > - a very simple algorithm for dead code elimination that walks
> > > through the program *backwards*
> > > - an algorithm to merge MUL and ADD into MAD
> >
> > I once wrote an algo that had the ability to remove all write masks
> > and swizzles of all instructions that do not contribute to the
> > results. Simply dropping instructions with no write mask implements
> > dead code elimination.
> 
> Did you publish that code somewhere? I was thinking of implementing
> the exact same thing some time in the future, but if you already have
> something like it that can be adapted...

No, didn't release it. I'm not even sure if I still have it(blew couple
hard disks few years back). Have to check my desktop hds when I get a
chance. 

> 
> > Following that, instructions can be divided into two groups:
> > -instructions that have fixed output and thus determine which
> > components of temporary registers must be fixed
> > -instructions where all result components correspond to same
> > calculation(mad, xpd, ...)
> >
> > By properly combining these two you'd get optimal temporary register
> > usage. IIRC, the problem I did not solve was how to rearrange
> > instructions of two or more distinct calculations that join up
> > later in the program so that you'd use minimal amount of temporary
> > registers.
> 
> That problem is exactly why getting optimal temporary register usage
> *isn't* that simple ;)
> 
> I recall that there's an algorithm based on dynamic programming which
> does it. In general though, I have a feeling that we're too often
> trying to get a perfect solution in the first cut. I'd much rather
> add simple but useful optimizations at first.

Yep. It might actually work well enough without reordering. There
are/were some cases where fixed pipeline programs used more than 24
temps. Not sure about latest generation but using more temps on r4xx do
not slow down fragment programs. 

-- 
Aapo Tahkola

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Mesa3d-dev mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev

Reply via email to