> On 22 Nov 2017, at 17:33, Jeff Law <l...@redhat.com> wrote:
> 
> On 11/22/2017 04:31 AM, Alan Hayward wrote:
>> 
>>> On 21 Nov 2017, at 03:13, Jeff Law <l...@redhat.com> wrote:
>>>> 
>>>>> 
>>>>> You might also look at TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  I'd
>>>>> totally forgotten about it.  And in fact it seems to come pretty close
>>>>> to what you need…
>>>> 
>>>> Yes, some of the code is similar to the way
>>>> TARGET_HARD_REGNO_CALL_PART_CLOBBERED works. Both that code and the
>>>> CLOBBER expr code served as a starting point for writing the patch. The 
>>>> main difference
>>>> here, is that _PART_CLOBBERED is around all calls and is not tied to a 
>>>> specific Instruction,
>>>> it’s part of the calling abi. Whereas clobber_high is explicitly tied to 
>>>> an expression (tls_desc).
>>>> It meant there wasn’t really any opportunity to resume any existing code.
>>> Understood.  Though your first patch mentions that you're trying to
>>> describe partial preservation "around TLS calls". Presumably those are
>>> represented as normal insns, not call_insn.
>>> 
>>> That brings me back to Richi's idea of exposing a set of the low subreg
>>> to itself using whatever mode is wide enough to cover the neon part of
>>> the register.
>>> 
>>> That should tell the generic parts of the compiler that you're just
>>> clobbering the upper part and at least in theory you can implement in
>>> the aarch64 backend and the rest of the compiler should "just work"
>>> because that's the existing semantics of a subreg store.
>>> 
>>> The only worry would be if a pass tried to get overly smart and
>>> considered that kind of set a nop -- but I think I'd argue that's simply
>>> wrong given the semantics of a partial store.
>>> 
>> 
>> So, the instead of using clobber_high(reg X), to use set(reg X, reg X).
>> It’s something we considered, and then dismissed.
>> 
>> The problem then is you are now using SET semantics on those registers, and 
>> it
>> would make the register live around the function, which might not be the 
>> case.
>> Whereas clobber semantics will just make the register dead - which is exactly
>> what we want (but only conditionally).
> ?!?  A set of the subreg is the *exact* semantics you want.  It says the
> low part is preserved while the upper part is clobbered across the TLS
> insns.
> 
> jeff

Consider where the TLS call is inside a loop. The compiler would normally want
to hoist that out of the loop. By adding a set(x,x) into the parallel of the 
tls_desc we
are now making x live across the loop, x is dependant on the value from the 
previous
iteration, and the tls_desc can no longer be hoisted.

Or consider a stream of code containing two tls_desc calls (ok, the compiler 
might
optimise one of the tls calls away, but this approach should be reusable for 
other exprs).
Between the two set(x,x)’s x is considered live so the register allocator can’t 
use that
register.
Given that we are applying this to all the neon registers, the register 
allocator now throws
an ICE because it can’t find any free hard neon registers to use.


Alan.

Reply via email to