On Sun, Mar 27, 2011 at 3:44 PM, H.J. Lu <hjl.to...@gmail.com> wrote:
> Here is a patch to split AVX 32byte unalignd load/store: > > http://gcc.gnu.org/ml/gcc-patches/2011-02/msg00743.html > > It speeds up some SPEC CPU 2006 benchmarks by up to 6%. > OK for trunk? > 2011-02-11 H.J. Lu <hongjiu...@intel.com> > > * config/i386/i386.c (flag_opts): Add -mavx256-split-unaligned-load > and -mavx256-split-unaligned-store. > (ix86_option_override_internal): Split 32-byte AVX unaligned > load/store by default. > (ix86_avx256_split_vector_move_misalign): New. > (ix86_expand_vector_move_misalign): Use it. > > * config/i386/i386.opt: Add -mavx256-split-unaligned-load and > -mavx256-split-unaligned-store. > > * config/i386/sse.md (*avx_mov<mode>_internal): Verify unaligned > 256bit load/store. Generate unaligned store on misaligned memory > operand. > (*avx_movu<ssemodesuffix><avxmodesuffix>): Verify unaligned > 256bit load/store. > (*avx_movdqu<avxmodesuffix>): Likewise. > > * doc/invoke.texi: Document -mavx256-split-unaligned-load and > -mavx256-split-unaligned-store. > > gcc/testsuite/ > > 2011-02-11 H.J. Lu <hongjiu...@intel.com> > > * gcc.target/i386/avx256-unaligned-load-1.c: New. > * gcc.target/i386/avx256-unaligned-load-2.c: Likewise. > * gcc.target/i386/avx256-unaligned-load-3.c: Likewise. > * gcc.target/i386/avx256-unaligned-load-4.c: Likewise. > * gcc.target/i386/avx256-unaligned-load-5.c: Likewise. > * gcc.target/i386/avx256-unaligned-load-6.c: Likewise. > * gcc.target/i386/avx256-unaligned-load-7.c: Likewise. > * gcc.target/i386/avx256-unaligned-store-1.c: Likewise. > * gcc.target/i386/avx256-unaligned-store-2.c: Likewise. > * gcc.target/i386/avx256-unaligned-store-3.c: Likewise. > * gcc.target/i386/avx256-unaligned-store-4.c: Likewise. > * gcc.target/i386/avx256-unaligned-store-5.c: Likewise. > * gcc.target/i386/avx256-unaligned-store-6.c: Likewise. > * gcc.target/i386/avx256-unaligned-store-7.c: Likewise. > > @@ -203,19 +203,37 @@ > return standard_sse_constant_opcode (insn, operands[1]); > case 1: > case 2: > + if (GET_MODE_ALIGNMENT (<MODE>mode) == 256 > + && ((TARGET_AVX256_SPLIT_UNALIGNED_STORE > + && MEM_P (operands[0]) > + && MEM_ALIGN (operands[0]) < 256) > + || (TARGET_AVX256_SPLIT_UNALIGNED_LOAD > + && MEM_P (operands[1]) > + && MEM_ALIGN (operands[1]) < 256))) > + gcc_unreachable (); Please use "misaligned_operand (operands[...], <MODE>mode)" instead of MEM_P && MEM_ALIGN combo in a couple of places. OK with that change. Thanks, Uros.