Tamar Christina <tamar.christ...@arm.com> writes:
>  […]
>> >> > We generate for e.g.:
>> >> >
>> >> > #include <stdint.h>
>> >> >
>> >> > uint16_t f8 (uint8_t xr, uint8_t xc){
>> >> >     return (uint8_t)(xr * xc);
>> >> > }
>> >> >
>> >> > (insn 9 6 10 2 (set (reg:HI 101)
>> >> (zero_extend:HI (reg/v:QI 96 [ xr ]))) "prom.c":4:16 -1
>> >> (nil))
>> >> (insn 10 9 11 2 (set (reg:HI 102)
>> >> (zero_extend:HI (reg/v:QI 98 [ xc ]))) "prom.c":4:16 -1
>> >> (nil))
>> >> (insn 11 10 12 2 (set (reg:SI 103)
>> >> (mult:SI (subreg:SI (reg:HI 101) 0)
>> >> (subreg:SI (reg:HI 102) 0))) "prom.c":4:16 -1
>> >> (nil))
>> >> >
>> >> > Out of expand. The paradoxical subreg isn't generated at all out of
>> >> > expand unless it's needed. It does keep the original params around
>> >> > as
>> >> unused:
>> >> >
>> >> > (insn 2 7 4 2 (set (reg:QI 97)
>> >> (reg:QI 0 x0 [ xr ])) "prom.c":3:37 -1
>> >> (nil))
>> >> (insn 4 2 3 2 (set (reg:QI 99)
>> >> (reg:QI 1 x1 [ xc ])) "prom.c":3:37 -1
>> >> (nil))
>> >> >
>> >> > And the paradoxical subreg is moved into the first operation requiring 
>> >> > it:
>> >> >
>> >> > (insn 11 10 12 2 (set (reg:SI 103)
>> >> (mult:SI (subreg:SI (reg:HI 101) 0)
>> >> (subreg:SI (reg:HI 102) 0))) "prom.c":4:16 -1
>> >> (nil))
>> >>
>> >> Ah, OK, this isn't what I'd imaagined.  I thought the xr and xc
>> >> registers would be SIs and the DECL_RTLs would be QI subregs of those SI
>> regs.
>> >> I think that might work better, for the reasons above.  (That is,
>> >> whenever we need the register in extended form, we can simply extend
>> >> the existing reg rather than create a new one.)
>> >
>> > Ah, I see, no, I explicitly avoid this. When doing the type promotions
>> > I tell it that size of the copies of xr and xc is still the original size, 
>> > e.g. QI (i.e. I
>> don't change 97 and 99).
>> > This is different from what we do with extends where 97 and 99 *would*
>> be changed.
>> >
>> > The reason is that if I make this SI the compiler thinks it knows the
>> > value of all the bits in the register which led to various miscompares as 
>> > it
>> thinks it can use the SI value directly.
>> >
>> > This happens because again the xr and xc are hard regs. So having 97
>> > be
>> >
>> > (set (reg:SI 97) (subreg:SI (reg:QI 0 x0 [ xr ]) 0))
>> >
>> > gets folded to an incorrect
>> >
>> > (set (reg:SI 97) (reg:SI 0 x0 [ xr ]))
>> 
>> This part I would expect (and hope for :-)).
>> 
>> > And now 97 is free to be used without any zero extension, as 97 on it's own
>> is an invalid RTX.
>> 
>> But the way I'd imagined it working, expand would need to insert an
>> extension before any operation that needs the upper 24 bits to be defined
>> (e.g. comparisons, right shifts).  If the DECL_RTL is (subreg:QI (reg:SI x) 
>> 0)
>> then the upper bits are not defined, since SUBREG_PROMOTED_VAR_P
>> would/should be false for the subreg.
>
> Ah I see, my fear here was that if we have a pattern which splits out the 
> zero-extend for whatever reason
> that if it gets folded it would be invalid.  But I think I understand what 
> you meant.  In your case
> we'd never again use the hardreg, but that everything goes through 97. Got it.

Yeah.  The expand code is supposed to move the hard register into a
pseudo at the earliest opportunity (at the head of the function) and
then everything else should use the pseudo.  Using the hard register
later could lead to spill failures, or to attempts to keep the register
live across calls.

>> E.g. for:
>> 
>>   int8_t foo(int8_t x) { return x >> 1; }
>> 
>> x would have a DECL_RTL of (subreg:QI (reg:SI x) 0), the parameter
>> assignment would be expanded as:
>> 
>>   (set (reg:SI x) (reg:SI x0))
>> 
>> the shift would be expanded as:
>> 
>>   (set (reg:SI x) (zero_extend:SI (subreg:QI (reg:SI x) 0)))
>>   (set (reg:SI x) (ashiftrt:SI (reg:SI x) (const_int 1)))
>> 
>> and the return assignment would be expanded as:
>> 
>>   (set (reg:SI x0) (reg:SI x))
>> 
>> x + 1 would instead be expanded to just:
>> 
>>   (set (reg:SI x) (plus:SI (reg:SI x) (const_int 1)))
>> 
>> (without an extension).
>> 
>> I realised later though that, although reusing the DECL_RTL reg for the
>> extension has the nice RA property of avoiding multiple live values, it would
>> make it harder to combine the extension into the operation if the variable is
>> still live afterwards.  So I guess we lose something both ways.
>> 
>> Maybe we need a different approach, not based on changing
>> PROMOTE_MODE.
>> 
>> I wonder how easy it would be to do the promotion in gimple, then reuse
>> backprop to determine when a sign/zero-extension (i.e. a normal gimple cast)
>> can be converted into an “any extend”
>> (probably represented as a new ifn).
>
> Do you mean without changing the hook implementation but keeping the current 
> promotion?

Yeah, keep the hook definitions as they are now, but do the promotion
in gimple by widening types where necessary.

> I guess the problem here is that it's the inverse cases that's the problem 
> isn't it? It's not that in
> gimple there are unneeded extends, it's that some operations require an 
> any-extend no?
>
> like in gimple ~a where a is an 8-bit quantity requires an any-extend, but no 
> cast would be there
> in gimple.
>
> So for instance
>
> #include <stdint.h>
>
> uint8_t f (uint8_t a)
> {
>     return ~a;
> }
>
> Is just simply:
>
> f (uint8_t a)
> {
>   uint8_t _2;
>
>   <bb 2> [local count: 1073741824]:
>   _2 = ~a_1(D);
>   return _2;
>
> }

Right.  But the idea was to run a late-ish isel-like pass that converts
this to:

f (uint8_t a)
{
  uint8_t _2;
  uint32_t _3;
  uint8_t _4;

  <bb 2> [local count: 1073741824]:
  _2 = .ANY_EXTEND(a_1(D), (uint32_t)0);
  _3 = ~_2;
  _4 = (uint8_t)_3;
  return _4;
}

We'd need to experiment with the best placing of the pass.

> In gimple. I'm also slightly worried about interfering with phi opts. 
> Backprop runs
> before ifcombine and pihops for instance and there are various phi opts like 
> ifcombine_ifandif
> that rely on the BB containing only the phi node.  Adding an any-extend in 
> between would break this.

Yeah, the current backprop pass runs quite early.  But I meant that we
could reuse the code (or simply run another instance of the pass,
extended to handle this case) at the appropriate point.

> I also wonder if the IFN won't interfere with range analysis of expressions. 
> Unless we manage to strategically
> Insert the IFNs on entire expressions and not in the intermediate SSA 
> components.

This would be part of the trade-off in placing the promotion pass.

Thanks,
Richard

Reply via email to