Hello guys,

So, what about the patch? I think since we already have zee patch it
would be great to use it as more general optimization. I tested it on
EEMBC 2.0 on Atom and it showed 1% performance gain in geomean on 32
bit which is really good for such simple optimization. For OOO archs
patch is not so critical but still makes code cleaner

Thanks,
Ilya

2011/11/6 Ilya Enkovich <enkovich....@gmail.com>:
> 2011/11/6 Richard Guenther <richard.guent...@gmail.com>:
>> On Sun, Nov 6, 2011 at 11:50 AM, Ilya Enkovich <enkovich....@gmail.com> 
>> wrote:
>>> Hello,
>>>
>>> 2011/11/5 Eric Botcazou <ebotca...@adacore.com>:
>>>>> Here is a patch which fixes redundant zero extensions problem. Issue
>>>>> is resolved by expanding implicit_zee pass functionality to cover zero
>>>>> and sign extends of different modes. Could please someone review it?
>>>>
>>>> Could you explain the undelying idea?  The current strategy of 
>>>> implicit-zee.c
>>>> is exposed at length at the beginning of the file, but here's a summary:
>>>>
>>>>  1. On some architectures (typically x86-64), implicity zero-extensions are
>>>> applied when instructions operate in selected sub-word modes (SImode here):
>>>>
>>>>  addl edi,eax
>>>>
>>>> has an implicit zero-extension for %rax.
>>>>
>>>>  2. Because of 1, the second instruction in sequences like:
>>>>
>>>>  (set (reg:SI x) (plus:SI (reg:SI z1) (reg:SI z2)))
>>>>  (set (reg:DI x) (zero_extend:DI (reg:SI x)))
>>>>
>>>> is redundant.
>>>>
>>>>  3. The pass recognizes this and transforms the above sequence into:
>>>>
>>>>  (set (reg:DI x) (zero_extend:DI (plus:SI (reg:SI z1) (reg:SI z2))))
>>>>
>>>> and the machine description knows how to translate this into an 'addl'.
>>>>
>>>>
>>>> You're proposing extending this to other modes and other architectures, for
>>>> example QImode on x86.  But does
>>>>
>>>>  addb %dl, %al
>>>>
>>>> modify the entire %eax register on x86?  In other words, are you really 
>>>> after
>>>> implicit (zero-)extensions or after something else, like global 
>>>> elimination of
>>>> redundant extensions?
>>> Initial aim of the pass was to remove zero extentions redundant due to
>>> implicit zero extention in x64. But implementation actually uses
>>> generic approach and seems like a mini-combiner. Pass may combine two
>>> zero extends or combine zero extend with a constant as a special case
>>> but in other cases we just try to merge two instructions and then
>>> check we have corresponding template. It can be easily adopted to
>>> remove all redundant extensions. So, byte add in the example will be
>>> merged with zxero extend only if we have explicit template for it in
>>> machine model.
>>>
>>>>
>>>> What's the effect of the patch on the testcase in the PR in terms of insns 
>>>> at
>>>> the RTL level?  Why doesn't the combiner already optimize it?
>>> The patch helps to remove two zero extends from RTL in the test from
>>> PR. I believe zee pass was introduced after postreload pass because we
>>> should have additional memory instructions by that time and therefore
>>> more opportunities for optimization after combiner work.
>>>
>>> In this particular test case combiner may also help because we have
>>> byte memory load and extend on combiner pass. But due to some reason
>>> it does not merge them. In combiner dump I see
>>>
>>> (insn 39 38 40 4 (set (reg/v:QI 81 [ xr ])
>>>        (mem:QI (reg/v/f:DI 111 [ ImageInPtr ]) [0 MEM[base:
>>> ImageInPtr_29, offset: 0B]+0 S1 A8])) 1.c:9 66 {*movqi_internal}
>>>     (nil))
>>>
>>> (insn 43 42 44 4 (parallel [
>>>            (set (reg:SI 116 [ xr ])
>>>                (zero_extend:SI (reg/v:QI 81 [ xr ])))
>>>            (clobber (reg:CC 17 flags))
>>>        ]) 1.c:11 121 {*zero_extendqisi2_movzbl_and}
>>>     (expr_list:REG_DEAD (reg/v:QI 81 [ xr ])
>>>        (expr_list:REG_UNUSED (reg:CC 17 flags)
>>>            (nil))))
>>>
>>> and
>>>
>>> Trying 39 -> 43:
>>>
>>> With no additional information.
>>
>> Well, I bet it's because of the CC clobber which is there
>> because of the use of TARGET_ZERO_EXTEND_WITH_AND.
>> Where does that insn get generated?  By combine itself?
>
> This insn is generated by expand:
>
> (insn 43 42 44 6 (parallel [
>            (set (reg:SI 116)
>                (zero_extend:SI (reg/v:QI 81 [ xr ])))
>            (clobber (reg:CC 17 flags))
>        ]) 1.c:11 -1
>     (nil))
>
>>
>> Richard.
>>
>>>> Enhancing implicit-zee.c to address missed optimizations like the one 
>>>> reported
>>>> in target/50038 might well be the best approach, but the strategy shift 
>>>> must be
>>>> clearly exposed and discussed.  The reported numbers are certainly 
>>>> impressive.
>>>>
>>>> --
>>>> Eric Botcazou
>>>>
>>>
>>> Ilya
>>>
>>
>

Reply via email to