> Guillermo Adrián Molina writes:
>  > > Sets the steps for processing. However the spill worklist has some
>  > > registers on it that shouldn't be spilled, so it tries to select a
>  > > register to spill. It discards all registers then fails.
>  > >
>  > > I'd see if there are any moves that might be spilled afterwards,
>  > > if so, then all you'd need to do is allow spillRegister to fail
>  > > gracefully.
>  > >
>  >
>  > Ok, I will try to see what is happening. Is there any hard limit
> (besides
>  > the number of available registers in x86 arch)?
>
> There should be no limit on the number of registers you can use. The
> worst that should happen is you end up with a lot of spill code.
>
>  > >  > Another thing, Do you want the code I made for cmovxx?
>  > >
>  > > I'm interested.
>  > >
>  > > Does it have unit test coverage? Exupery development relies on
>  > > testing so that's required.
>  > >
>  > Not right now, I will work on that later, When I have it I will send it
> to
>  > you.
>
> OK
>
>  > > When was cmov introduced? I know it was a long time ago but can't
>  > > remember precisely when. What I'm concerned with is making Exupery
>  > > incompatable with some chips that might still be being used.
>  > >
>  >
>  > Intel's optimization manual says that cmov was introduced in Pentium,
> and
>  > in AMD's optimization manual says that cmov is available from athlon. I
>  > actually didn't investigate that thoroughly. The fact is that any
> modern
>  > computer should have it. I know that in earlier implementations of cmov
>  > (Pentium Pro) using the instruction wasn't really an advantage. But
> now,
>  > it is really faster. My tinyBenchamrks showed a speed up of 10% when I
>  > implemented cmov for smallinteger additions.
>  > But, If you are really concerned about compatibility I think you should
> be
>  > better considering not to use it.
>
> I'm surprised that your SmallInteger addition code was helped.
>
> In Exupery the SmallInteger addtion sequence is
>    bitTest arg1
>    jumpIfSet failureBlock
>    bitTest arg2
>    jumpIfSet failureBlock
>    clearTagBit arg1
>    add arg1 arg2
>    jumpOverflow failureBlock
>
> The failure case is a full message send.
>
The problem with the above code is that you have 3 branches.
That is why I need jump tables, there are cases where cmov really dosn't help

Before I started using exupery, I called special methods in C that
implemented faster code. Every special method (and primitives) returned 1
in case of an error, and if success, returned the result object.
One of this special methods was +. This is part of the code:

if(areIntegers(rcvr,arg)) {
        int result;
        asm(    "movl $1,%%edx\n\t"
                "movl %[rcvr],%[result]\n\t"
                "addl %[arg],%[result]\n\t"
                "cmovol %%edx,%[result]"
                : [result] "=r" (result)
                : [rcvr] "r" (rcvr), [arg] "r" (arg)
                : "edx" );
        return result;
}

with this code, I've got up to 10% faster code in + intensive tests.


> There are code fragments where cmov whould be helpful. Converting
> to a boolean comes to mind. The part of "a > b" where you're loading
> either true or false into the result register.
>

Yes, I implemented that with exupery (code for less "<"):

self addExpression:  (MedMov
        from: (self literal: false)
        to: answer      ).
trueReg := machine createTemporaryRegister.
self addExpression:  (MedMov
        from: (self literal: true)
        to: trueReg     ).
self addExpression:  (MedComparision
        operator: #cmp
        arg1: arg1
        arg2: arg2).
self addExpression:  (MedCMov
        type: #cmovl
        from: trueReg
        to: answer).

This gave me an impressive improvement (up to 40-50%), when I implemented
all the smallint comparissons in this way. Because, as you know, we dont
need to detag before compare.


>  > > Given adequate test coverage I'll add it.
>  >
>  > I also implemented enter and leave instructions. Not because they were
>  > better (they aren't), but, beacuse I use it to signal the inclusion of
>  > additional prologue and epilogue code in a final phase added just after
>  > the allocator. I do it that way because I dont know until then, which
>  > registrs are used, and the number of additional temps needed. I know
> that
>  > exupery allways push and pop all the registers (which aren't eax, edx
> and
>  > ecx). And that it make place for a big context as temp space in stack.
> I
>  > don't do that. I only push the used regs, and if that is not enough, I
>  > enter additional stack space. That brakes compatibility with original
>  > exupery, but I wanted to implement it that way. For small methods, that
> is
>  > really better.
>  > So, given that, I don't offer anything of this for you. I think you'll
>  > understand.
>
> Exupery's prolog and epilogue sequences could be improved. I've been
> thinking about overhauling that area for a few years now. I'd like
> to have variables spill into their actual locations. So if a stack
> variable was stored, it would always be fetched from the context.
> Then spilled registers wouldn't need to be loaded and stored on
> context switches.
>
> On thing that I might do in 0.13 is colour the isolated parts of a
> method separately. That should improve register allocation as the
> inteference graph will not be polluted by other isolated sections of
> code. A compiled method is often made up of completely isolated
> sections of code. Colouring the sections separately should also speed
> up register allocation.
>

Every improvement you make will help me.
Cheers, Guille


> Bryce
> _______________________________________________
> Exupery mailing list
> [email protected]
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
>


_______________________________________________
Exupery mailing list
[email protected]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery

Reply via email to