Hi, I modified the patch as H.J. suggested (patch attached).
Is it OK to commit to trunk now? Thanks, Changpeng ________________________________________ From: H.J. Lu [hjl.to...@gmail.com] Sent: Friday, June 17, 2011 5:44 PM To: Fang, Changpeng Cc: Richard Guenther; gcc-patches@gcc.gnu.org Subject: Re: [PATCH, PR 49089] Don't split AVX256 unaligned loads by default on bdver1 and generic On Fri, Jun 17, 2011 at 3:18 PM, Fang, Changpeng <changpeng.f...@amd.com> wrote: > Hi, > > I added AVX256_SPLIT_UNALIGNED_STORE to ix86_tune_indices > and put m_COREI7, m_BDVER1 and m_GENERIC as the targets that > enable it. > > Is this OK? Can you do something similar to how MASK_ACCUMULATE_OUTGOING_ARGS is handled? Thanks. H.J.
From 50310fc367348b406fc88d54c3ab54d1a304ad52 Mon Sep 17 00:00:00 2001 From: Changpeng Fang <chfang@huainan.(none)> Date: Mon, 13 Jun 2011 13:13:32 -0700 Subject: [PATCH 2/2] pr49089: enable avx256 splitting unaligned load/store only when beneficial * config/i386/i386.c (avx256_split_unaligned_load): New definition. (avx256_split_unaligned_store): New definition. (ix86_option_override_internal): Enable avx256 unaligned load(store) splitting only when avx256_split_unaligned_load(store) is set. --- gcc/config/i386/i386.c | 12 ++++++++++-- 1 files changed, 10 insertions(+), 2 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 7b266b9..3bc0b53 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -2121,6 +2121,12 @@ static const unsigned int x86_arch_always_fancy_math_387 = m_PENT | m_ATOM | m_PPRO | m_AMD_MULTIPLE | m_PENT4 | m_NOCONA | m_CORE2I7 | m_GENERIC; +static const unsigned int x86_avx256_split_unaligned_load + = m_COREI7 | m_GENERIC; + +static const unsigned int x86_avx256_split_unaligned_store + = m_COREI7 | m_BDVER1 | m_GENERIC; + /* In case the average insn count for single function invocation is lower than this constant, emit fast (but longer) prologue and epilogue code. */ @@ -4194,9 +4200,11 @@ ix86_option_override_internal (bool main_args_p) if (flag_expensive_optimizations && !(target_flags_explicit & MASK_VZEROUPPER)) target_flags |= MASK_VZEROUPPER; - if (!(target_flags_explicit & MASK_AVX256_SPLIT_UNALIGNED_LOAD)) + if ((x86_avx256_split_unaligned_load & ix86_tune_mask) + && !(target_flags_explicit & MASK_AVX256_SPLIT_UNALIGNED_LOAD)) target_flags |= MASK_AVX256_SPLIT_UNALIGNED_LOAD; - if (!(target_flags_explicit & MASK_AVX256_SPLIT_UNALIGNED_STORE)) + if ((x86_avx256_split_unaligned_store & ix86_tune_mask) + && !(target_flags_explicit & MASK_AVX256_SPLIT_UNALIGNED_STORE)) target_flags |= MASK_AVX256_SPLIT_UNALIGNED_STORE; } } -- 1.7.0.4