Hello! > Silvermont processors have penalty for instructions having 4+ bytes of > prefixes (including escape > bytes in opcode). This situation happens when REX prefix is used in SSE4 > instructions. This > patch tries to avoid such situation by preferring xmm0-xmm7 usage over > xmm8-xmm15 in those > instructions. I achieved it by adding new tuning flag and new alternatives > affected by tuning.
> SSE4 instructions are not very widely used by GCC but I see some significant > gains caused by > this patch (tested on Avoton on -O3). > 2014-07-02 Ilya Enkovich <ilya.enkov...@intel.com> > * config/i386/constraints.md (Yr): New. > * config/i386/i386.h (reg_class): Add NO_REX_SSE_REGS. > (REG_CLASS_NAMES): Likewise. > (REG_CLASS_CONTENTS): Likewise. > * config/i386/sse.md (*vec_concatv2sf_sse4_1): Add alternatives > which use only NO_REX_SSE_REGS. You don't need to add alternatives, just change existing alternatives from "x" to "Yr". The allocator will handle reduced register set just fine. BTW: I think that "Yr" is a very confusing name for the alternative. I'd suggest "Ya". Uros.