Re: [PATCH][aarch64] Fix target/pr77729 - missed optimization related to zero extension

Jeff Law Thu, 14 Sep 2017 08:04:11 -0700

On 09/13/2017 03:46 PM, Steve Ellcey wrote:
> On Wed, 2017-09-13 at 14:46 -0500, Segher Boessenkool wrote:
>> On Wed, Sep 13, 2017 at 06:13:50PM +0100, Kyrill Tkachov wrote:
>>>  
>>> We are usually hesitant to add explicit subreg matching in the MD pattern
>>> (though I don't remember if there's a hard rule against it).
>>> In this case this looks like a missing simplification from combine 
>>> (simplify-rtx) so
>>> I think adding it there would be better.
> 
>> Yes, it probably belongs as a generic simplification in simplify-rtx.c;
>> if there is a reason not to do that, it can be done in combine.c
>> instead.
> 
> Actually, now that I look at it some more and compare it to the arm32
> version (where we do not have this problem) I think the problem starts
> well before combine.
> 
> In arm32 rtl expansion, when reading the QI memory location, I see
> these instructions get generated:
> 
> (insn 10 3 11 2 (set (reg:SI 119)
>         (zero_extend:SI (mem:QI (reg/v/f:SI 117 [ string ]) [0 *string_9(D)+0 
> S1 A8]))) "pr77729.c":4 -1
>      (nil))
> (insn 11 10 12 2 (set (reg:QI 118)
>         (subreg:QI (reg:SI 119) 0)) "pr77729.c":4 -1
>      (nil))
> 
> And in aarch64 rtl expansion I see:
> 
> (insn 10 9 11 (set (reg:QI 81)
>         (mem:QI (reg/v/f:DI 80 [ string ]) [0 *string_9(D)+0 S1 A8])) 
> "pr77729.c":3 -1
>      (nil))
> 
> Both of these sequences expand to ldrb but in the arm32 case I know
> that I set all 32 bits of the register (even though I only want the
> bottom 8 bits), but for aarch64 I only know that I set the bottom 8
> bits and I don't know anything about the higher bits, meaning I have to
> keep the AND instruction to mask out the upper bits on aarch64.
It's one of the reasons I discourage subregs -- the number of cases
where we can optimize based on the "don't care" semantics are relatively
small in my experience and I consistently see cases where the "don't
care" property of the subreg turns into "don't know" and suppresses
downstream optimizations.


It's always a judgment call, but more and more often I find myself
pushing towards defining those bits using a zero/sign extension, bit
operation or whatever rather than using subregs.


> 
> I think we should change the movqi/movhi expansions on aarch64 to
> recognize that the ldrb/ldrh instructions zero out the upper bits in
> the register by generating rtl like arm32 does.
Is LOAD_EXTEND_OP defined for aarch64?

It may also be worth looking at ree.c -- my recollection is that it
didn't handle subregs, but it could and probably should.

jeff

Re: [PATCH][aarch64] Fix target/pr77729 - missed optimization related to zero extension

Reply via email to