I hate the idea of having a POWER7 FTR bit.  Every loon will (and has
tried to in the past) attach every POWER7 related thing to it, rather
than thinking about what the feature really is for.

What about other processors which could also benefit from this copy
loop?  Turning on CPU_FTR_POWER7 for them is gonna look a bit silly.

As we discussed online, we could call it CPU_FTR_VMX_COPY and start
thinking about a better way to solve the CPU feature bit mess.

But then, most CPUs with VMX will not want that, because it is slower
code for them.  For things like copy loops it makes perfect sense to
have them tuned per CPU core.  For example, this code likes to use
unaligned stores over more complicated shift-and-combine stuff; that
works great on POWER7, but not on much else.

Maybe you should have the various kinds of loop ("source aligned, dest
unaligned, using unaligned stores, 64 bytes") as asm routines, have
some higher level code (which can be runtime patched) select which
to run.

One idea would be to have a structure of function pointers for each
CPU that gets runtime patched into the right places, similar to how we
do some of the MMU fixups.

Sounds good to me :-)


Segher

_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Reply via email to