Am Sonntag, 23. März 2014, 19:46:24 schrieb Bernd Paysan:
> Am Sonntag, 23. März 2014, 18:38:58 schrieb David Kuehling:
> > Replying to myself, quick update (before I have to shutdown my computer
> > for today):
> > 
> > The instruction in question is 'rdhwr v1,$29' which is mips32r2, i.e.
> > 
> > not supported on Loongson2f.  GCC outputs it via a sequence like:
> >         .set    push
> >         .set    mips32r2
> >         rdhwr   $3,$29
> >         .set    pop
> > 
> > I guess on MIPS the GCC runtime nowadays uses model specific register
> > $29 (which is not CPU reg $29 !) for addressing thread local storage.
> > To support older mipses this is implemented in kernel via an invalid
> > opcode interrupt emulation.  I.e. very slow.  How can we prevent writes
> > to thread local storage from creeping into goto*?
> 
> This stuff is copied from the first NEXT, i.e. the thing between
> before_goto: and after_goto:
> 
> #define FIRST_NEXT_P2 NEXT_P1_5; GOTO_ALIGN; \
> before_goto: goto *real_ca; after_goto:
> 
> Suggestion: Add a "asm volatile("": : :"memory")" before "before_goto:"
> 
> That should scare GCC to move stuff behind it.

I've looked at what ARM and x86_64 GCC do, and they also move in some stuff, 
x86_64 less, ARM more.  It's not as bad as your case (with an emulated 
function), but it's still stuff.  asm __volatile__ ("": : :"memory") doesn't 
prevent it.  Neither does calling a dummy function.

What did the trick?  Using FIRST_NEXT actually in after_last:, this is a dummy 
for getting the tail of the last address, we can put anything we like there.  
Doing FIRST_NEXT there makes it a noop, and since there's nothing to move into 
the goto, it stays as small as it should.

On the Core i7, I see no difference (the two leas and the one write are 
swallowed by the sheer power of the Core i7), but on my Galaxy Note II, this 
gives a very clear and significant speedup:

 0.575 0.710  0.365 0.750 0.390 2014-03-24; Exynos 4 Quad 1.6GHz; gcc-4.8.x 
(Android 4.3)
 0.735 0.920  0.900 1.110 0.690 2012-10-31; Exynos 4 Quad 1.6GHz; gcc-4.6.x 
(Android 4.1.1)

-- 
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://bernd-paysan.de/

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply via email to