On Wed, Apr 4, 2012 at 5:39 PM, H.J. Lu <hjl.to...@gmail.com> wrote: > On Wed, Apr 4, 2012 at 5:07 PM, Teresa Johnson <tejohn...@google.com> wrote: >> New patch to avoid LCP stalls based on feedback from earlier patch. I >> modified >> H.J.'s old patch to perform the peephole2 to split immediate moves to HImode >> memory. This is now enabled for Core2, Corei7 and Generic. >> >> I verified that this enables the splitting to occur in the case that >> originally >> motivated the optimization. If we subsequently find situations where LCP >> stalls >> are hurting performance but an extra register is required to perform the >> splitting, then we can revisit whether this should be performed earlier. >> >> I also measured SPEC 2000/2006 performance using Generic64 on an AMD Opteron >> and the results were neutral. >> > > What are the performance impacts on Core i7? I didn't notice any significant > changes when I worked on it for Core 2.
One of our street map applications speeds up by almost 5% on Corei7 and almost 2.5% on Core2 from this optimization. It contains a hot inner loop with some conditional writes of zero into a short array. The loop is unrolled so that it does not fit into the LSD which would have avoided many of the LCP stalls. Thanks, Teresa > > Thanks. > > -- > H.J. -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413