peephole optimizer [was Re: Performance problem]

Matthew Danish Wed, 25 Sep 2002 17:33:20 -0700


On Wed, Sep 25, 2002 at 10:28:58PM +0200, Edi Weitz wrote:
> 
> Mario Mommer <[EMAIL PROTECTED]> writes:
> 
> > I know that gcc uses lots of "peephole" optimizations. That is, they
> > substitute sequences of assembly with other sequences "known" to be
> > faster. I think that the reason is that in x86 there is a lot more
> > voodoo available than pipelining and cache-line-filling, like
> > implicit over-allocation of registers, etc (1).
> 
> Wouldn't that be also an option for CL compilers? Are there any CL
> implementations out there which do "peephole" optimizations?
> 
> (Disclaimer: I have no idea what I'm talking about. The last assembler
> programs I wrote were for the 6502 and I've no experiences with
> implementing compilers. I just thought this might be a route feasible
> for CMUCL or other CL compilers.)


Python doesn't currently have a peephole optimizer, and I've been
looking at trying to fit one in.  Python never "gathers" all the
assembly before outputting it to machine code, it calls the VOP
functions which in turn call instruction-emitters which output machine
code directly into a vector.  There are two mechanisms for introducing
changes into this vector after the VOPs have been called: backpatches
and choosers.  Backpatches can alter a portion of the vector, and are
generally used for providing the final address for a branching
instruction.  Choosers have the ability to change the size of a portion
of the vector, possibly shrinking it if necessary.  As far as I can
tell, using one of these mechanisms would be the simplest way of
introducing peephole optimizations.  Code with possible need of
optimization can be identified as it is emitted, and then later on it
can be checked to see if it is still possible or not.

Example:  (this is a very common and simple sequence, occurring along
VOP boundaries)

As the code is being emitted, it might look like this:

L1:     MOV     EBX, ESP
L2:     MOV     ESP, EBX

Right now, nothing can be optimized here, due to the possible branch to
L2.

But later on, when labels are resolved and unnecessary ones eliminated,
it might look like this:

L1:     MOV     EBX, ESP
        MOV     ESP, EBX

Now, clearly, the second instruction is extraneous.

In any case I haven't had much time lately to work on it, but I do mean
to get around to it sometime.  Been hacking around with the SBCL
sources, but this should be transferable back to CMUCL.  I'm not a
Python expert, so if someone who knows better than me thinks this
approach is wrong, please let me know.

-- 
; Matthew Danish <[EMAIL PROTECTED]>
; OpenPGP public key: C24B6010 on keyring.debian.org
; Signed or encrypted mail welcome.
; "There is no dark side of the moon really; matter of fact, it's all dark."

peephole optimizer [was Re: Performance problem]

Reply via email to