Re: [RFC PATCH v4 00/11] powerpc: switch VDSO to C implementation.

Segher Boessenkool Mon, 20 Jan 2020 09:28:58 -0800

On Mon, Jan 20, 2020 at 06:08:23PM +0100, Christophe Leroy wrote:
> Not easy I think.
> 
> First we have the unavoidable ASM entry function that can't be dropped 
> because of the CR[SO] bit the set on error or clear on no error and that 
> can't be done in C.


Yup.

> In our ASM VDSO, fixed shifts are used, while in generic C VDSO, shifts 
> are generic and read from the VDSO data.

Does that cost more than just a few cycles?

> And there is still some funny code generated by GCC (8.1), like:
> 
>  620: 7d 29 3c 30     srw     r9,r9,r7
>  624: 21 87 00 20     subfic  r12,r7,32
>  628: 7d 07 3c 31     srw.    r7,r8,r7
>  62c: 7d 08 60 30     slw     r8,r8,r12
>  630: 7d 0b 4b 78     or      r11,r8,r9

(This can be done cheaper for fixed shifts, you can use rlwimi then).

>  634: 39 40 00 00     li      r10,0
>  638: 40 82 00 84     bne     6bc <__c_kernel_clock_gettime+0x114>
>  63c: 81 23 00 24     lwz     r9,36(r3)
>  640: 81 05 00 00     lwz     r8,0(r5)
> ...
>  6bc: 7d 69 5b 78     mr      r9,r11
>  6c0: 7c ea 3b 78     mr      r10,r7
>  6c4: 7d 2b 4b 78     mr      r11,r9
>  6c8: 4b ff ff 74     b       63c <__c_kernel_clock_gettime+0x94>
> 
> This branch to 6bc is totally useless:
> - copying r11 into r9 is pointless as r9 is overwritten in 63c
> - copying back r9 into r11 is pointless as r11 has not been modified 
> inbetween.

Yeah, huh, how did that happen.

> - loading r10 with 0 then overwritting r10 with r7 when r7 is not 0 is 
> pointless as well, could have directly put the result of srw. in r10.

This may be harder to make the compiler do.

But the r9/r11 thing suggests you are preventing optimisation somewhere,
maybe with some asm?  Do you have some small testcase I can compile?


Segher

Re: [RFC PATCH v4 00/11] powerpc: switch VDSO to C implementation.

Reply via email to