On Mon, 2016-05-30 at 09:08 +1000, Anton Blanchard via Linuxppc-dev wrote: > > That is surprising, do we have any idea what specifically increases > > the overhead so significantly ? Does gcc know about ldbrx/stdbrx ? I > > notice in our io.h for example we still do manual ld/std + swap > > because old processors didn't know these, we should fix that for > > CONFIG_POWER8 (or is it POWER7 that brought these ?). > > The futex issue seems to be __get_user_pages_fast(): > > ld r11,0(r6) > ... > rldicl r8,r11,32,32 > rotlwi r28,r11,24 > rlwimi r28,r11,8,8,15 > rotlwi r6,r8,24 > rlwimi r28,r11,8,24,31 > rlwimi r6,r8,8,8,15 > rlwimi r6,r8,8,24,31 > rldicr r28,r28,32,31 > or r28,r28,r6 > cmpdi cr7,r28,0 > beq cr7,2428 > > That's a whole lot of work just to check if a pte is zero. I assume > the reason gcc can't replace this with a byte reversed load is that > we access the pte via the READ_ONCE() macro.
Did I mention we need a bswap instruction? We can possibly improve some of them by doing the comparison on the raw value, eg. see hash__pte_same(). The above is from pgd_none() ? cheers _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev