> On 22 Nov 2017, at 17:33, Jeff Law <l...@redhat.com> wrote: > > On 11/22/2017 04:31 AM, Alan Hayward wrote: >> >>> On 21 Nov 2017, at 03:13, Jeff Law <l...@redhat.com> wrote: >>>> >>>>> >>>>> You might also look at TARGET_HARD_REGNO_CALL_PART_CLOBBERED. I'd >>>>> totally forgotten about it. And in fact it seems to come pretty close >>>>> to what you need… >>>> >>>> Yes, some of the code is similar to the way >>>> TARGET_HARD_REGNO_CALL_PART_CLOBBERED works. Both that code and the >>>> CLOBBER expr code served as a starting point for writing the patch. The >>>> main difference >>>> here, is that _PART_CLOBBERED is around all calls and is not tied to a >>>> specific Instruction, >>>> it’s part of the calling abi. Whereas clobber_high is explicitly tied to >>>> an expression (tls_desc). >>>> It meant there wasn’t really any opportunity to resume any existing code. >>> Understood. Though your first patch mentions that you're trying to >>> describe partial preservation "around TLS calls". Presumably those are >>> represented as normal insns, not call_insn. >>> >>> That brings me back to Richi's idea of exposing a set of the low subreg >>> to itself using whatever mode is wide enough to cover the neon part of >>> the register. >>> >>> That should tell the generic parts of the compiler that you're just >>> clobbering the upper part and at least in theory you can implement in >>> the aarch64 backend and the rest of the compiler should "just work" >>> because that's the existing semantics of a subreg store. >>> >>> The only worry would be if a pass tried to get overly smart and >>> considered that kind of set a nop -- but I think I'd argue that's simply >>> wrong given the semantics of a partial store. >>> >> >> So, the instead of using clobber_high(reg X), to use set(reg X, reg X). >> It’s something we considered, and then dismissed. >> >> The problem then is you are now using SET semantics on those registers, and >> it >> would make the register live around the function, which might not be the >> case. >> Whereas clobber semantics will just make the register dead - which is exactly >> what we want (but only conditionally). > ?!? A set of the subreg is the *exact* semantics you want. It says the > low part is preserved while the upper part is clobbered across the TLS > insns. > > jeff
Consider where the TLS call is inside a loop. The compiler would normally want to hoist that out of the loop. By adding a set(x,x) into the parallel of the tls_desc we are now making x live across the loop, x is dependant on the value from the previous iteration, and the tls_desc can no longer be hoisted. Or consider a stream of code containing two tls_desc calls (ok, the compiler might optimise one of the tls calls away, but this approach should be reusable for other exprs). Between the two set(x,x)’s x is considered live so the register allocator can’t use that register. Given that we are applying this to all the neon registers, the register allocator now throws an ICE because it can’t find any free hard neon registers to use. Alan.