On Wed, Apr 4, 2012 at 5:39 PM, H.J. Lu <hjl.to...@gmail.com> wrote:
> On Wed, Apr 4, 2012 at 5:07 PM, Teresa Johnson <tejohn...@google.com> wrote:
>> New patch to avoid LCP stalls based on feedback from earlier patch. I
>> H.J.'s old patch to perform the peephole2 to split immediate moves to HImode
>> memory. This is now enabled for Core2, Corei7 and Generic.
>> I verified that this enables the splitting to occur in the case that
>> motivated the optimization. If we subsequently find situations where LCP
>> are hurting performance but an extra register is required to perform the
>> splitting, then we can revisit whether this should be performed earlier.
>> I also measured SPEC 2000/2006 performance using Generic64 on an AMD Opteron
>> and the results were neutral.
> What are the performance impacts on Core i7? I didn't notice any significant
> changes when I worked on it for Core 2.
One of our street map applications speeds up by almost 5% on Corei7
and almost 2.5% on Core2 from this optimization. It contains a hot
inner loop with some conditional writes of zero into a short array.
The loop is unrolled so that it does not fit into the LSD which would
have avoided many of the LCP stalls.
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413