On 22/10/2019 17.01, Christophe Leroy wrote:
> 
> 
> On 10/18/2019 12:52 PM, Rasmus Villemoes wrote:
>> In preparation for allowing to build QE support for architectures
>> other than PPC, replace the ppc-specific io accessors. Done via
>>
> 
> This patch is not transparent in terms of performance, functions get
> changed significantly.
> 
> Before the patch:
> 
> 00000330 <ucc_fast_enable>:
>  330:    81 43 00 04     lwz     r10,4(r3)
>  334:    7c 00 04 ac     hwsync
>  338:    81 2a 00 00     lwz     r9,0(r10)
>  33c:    0c 09 00 00     twi     0,r9,0
>  340:    4c 00 01 2c     isync
>  344:    70 88 00 02     andi.   r8,r4,2
>  348:    41 82 00 10     beq     358 <ucc_fast_enable+0x28>
>  34c:    39 00 00 01     li      r8,1
>  350:    91 03 00 10     stw     r8,16(r3)
>  354:    61 29 00 10     ori     r9,r9,16
>  358:    70 88 00 01     andi.   r8,r4,1
>  35c:    41 82 00 10     beq     36c <ucc_fast_enable+0x3c>
>  360:    39 00 00 01     li      r8,1
>  364:    91 03 00 14     stw     r8,20(r3)
>  368:    61 29 00 20     ori     r9,r9,32
>  36c:    7c 00 04 ac     hwsync
>  370:    91 2a 00 00     stw     r9,0(r10)
>  374:    4e 80 00 20     blr
> 
> After the patch:
> 
> 0000030c <ucc_fast_enable>:
>  30c:    94 21 ff e0     stwu    r1,-32(r1)
>  310:    7c 08 02 a6     mflr    r0
>  314:    bf a1 00 14     stmw    r29,20(r1)
>  318:    7c 9f 23 78     mr      r31,r4
>  31c:    90 01 00 24     stw     r0,36(r1)
>  320:    7c 7e 1b 78     mr      r30,r3
>  324:    83 a3 00 04     lwz     r29,4(r3)
>  328:    7f a3 eb 78     mr      r3,r29
>  32c:    48 00 00 01     bl      32c <ucc_fast_enable+0x20>
>             32c: R_PPC_REL24    ioread32be
>  330:    73 e9 00 02     andi.   r9,r31,2
>  334:    41 82 00 10     beq     344 <ucc_fast_enable+0x38>
>  338:    39 20 00 01     li      r9,1
>  33c:    91 3e 00 10     stw     r9,16(r30)
>  340:    60 63 00 10     ori     r3,r3,16
>  344:    73 e9 00 01     andi.   r9,r31,1
>  348:    41 82 00 10     beq     358 <ucc_fast_enable+0x4c>
>  34c:    39 20 00 01     li      r9,1
>  350:    91 3e 00 14     stw     r9,20(r30)
>  354:    60 63 00 20     ori     r3,r3,32
>  358:    80 01 00 24     lwz     r0,36(r1)
>  35c:    7f a4 eb 78     mr      r4,r29
>  360:    bb a1 00 14     lmw     r29,20(r1)
>  364:    7c 08 03 a6     mtlr    r0
>  368:    38 21 00 20     addi    r1,r1,32
>  36c:    48 00 00 00     b       36c <ucc_fast_enable+0x60>
>             36c: R_PPC_REL24    iowrite32be

True. Do you know why powerpc uses out-of-line versions of these
accessors when !PPC_INDIRECT_PIO, i.e. at least all of PPC32? It's quite
a bit beyond the scope of this series, but I'd expect moving most if not
all of arch/powerpc/kernel/iomap.c into asm/io.h (guarded by
!defined(CONFIG_PPC_INDIRECT_PIO) of course) as static inlines would
benefit all ppc32 users of iowrite32 and friends.

Is there some other primitive available that (a) is defined on all
architectures (or at least both ppc and arm) and (b) expands to good
code in both/all cases?

Note that a few uses of the the iowrite32be accessors has already
appeared in the qe code with the introduction of the qe_clrsetbits()
helpers in bb8b2062af.

Rasmus

Reply via email to