This change revises the memory barrier patterns to use the ldcw instruction instead of the sync instruction. The sync instruction performs better and I have more confidence in it than sync.
We use a location just above the top of the stack for these operations. The stack address is aligned to a 16-byte boundary if the system is not coherent. I have added two new options. The first is the -mcoherent-ldcw option. The majority of PA 2.0 system have coherent caches and as a result the coherent ldcw completer can be used. In that case, the ldcw address doesn't require 16-byte alignment. We set the default to -mcoherent-ldcw. The second option is the -mordered option. Although all PA 1.x systems have ordered memory accesses, PA 2.0 systems are weakly ordered. Since PA 2.0 are now prevalent, we set the default to -mno-ordered. For ordered systems, we fall back to just a compiler memory barrier. I believe acquire and release fences can be defined in a similar way using an ordered load and an ordered store, respectively. Tested on hppa-unknown-linux-gnu, hppa2.0w-hp-hpux11.11 and hppa64-hp-hpux11.11. Committed to trunk. Dave 2019-11-07 John David Anglin <dang...@gcc.gnu.org> * config/pa/pa.md (memory_barrier): Revise to use ldcw barriers. Enhance comment. (memory_barrier_coherent, memory_barrier_64, memory_barrier_32): New insn patterns using ldcw instruction. (memory_barrier): Remove insn pattern using sync instruction. * config/pa/pa.opt (coherent-ldcw): New option. (ordered): New option. Index: config/pa/pa.md =================================================================== --- config/pa/pa.md (revision 277870) +++ config/pa/pa.md (working copy) @@ -10086,23 +10086,55 @@ (set_attr "length" "4,16")]) ;; PA 2.0 hardware supports out-of-order execution of loads and stores, so -;; we need a memory barrier to enforce program order for memory references. -;; Since we want PA 1.x code to be PA 2.0 compatible, we also need the -;; barrier when generating PA 1.x code. +;; we need memory barriers to enforce program order for memory references +;; when the TLB and PSW O bits are not set. We assume all PA 2.0 systems +;; are weakly ordered since neither HP-UX or Linux set the PSW O bit. Since +;; we want PA 1.x code to be PA 2.0 compatible, we also need barriers when +;; generating PA 1.x code even though all PA 1.x systems are strongly ordered. +;; When barriers are needed, we use a strongly ordered ldcw instruction as +;; the barrier. Most PA 2.0 targets are cache coherent. In that case, we +;; can use the coherent cache control hint and avoid aligning the ldcw +;; address. In spite of its description, it is not clear that the sync +;; instruction works as a barrier. + (define_expand "memory_barrier" - [(set (match_dup 0) - (unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER))] + [(parallel + [(set (match_dup 0) (unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER)) + (clobber (match_dup 1))])] "" { - operands[0] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode)); + /* We don't need a barrier if the target uses ordered memory references. */ + if (TARGET_ORDERED) + FAIL; + operands[1] = gen_reg_rtx (Pmode); + operands[0] = gen_rtx_MEM (BLKmode, operands[1]); MEM_VOLATILE_P (operands[0]) = 1; }) -(define_insn "*memory_barrier" +(define_insn "*memory_barrier_coherent" [(set (match_operand:BLK 0 "" "") - (unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER))] + (unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER)) + (clobber (match_operand 1 "pmode_register_operand" "=r"))] + "TARGET_PA_20 && TARGET_COHERENT_LDCW" + "ldcw,co 0(%%sp),%1" + [(set_attr "type" "binary") + (set_attr "length" "4")]) + +(define_insn "*memory_barrier_64" + [(set (match_operand:BLK 0 "" "") + (unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER)) + (clobber (match_operand 1 "pmode_register_operand" "=&r"))] + "TARGET_64BIT" + "ldo 15(%%sp),%1\n\tdepd %%r0,63,3,%1\n\tldcw 0(%1),%1" + [(set_attr "type" "binary") + (set_attr "length" "12")]) + +(define_insn "*memory_barrier_32" + [(set (match_operand:BLK 0 "" "") + (unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER)) + (clobber (match_operand 1 "pmode_register_operand" "=&r"))] "" - "sync" + "ldo 15(%%sp),%1\n\t{dep|depw} %%r0,31,3,%1\n\tldcw 0(%1),%1" [(set_attr "type" "binary") - (set_attr "length" "4")]) + (set_attr "length" "12")]) Index: config/pa/pa.opt =================================================================== --- config/pa/pa.opt (revision 277870) +++ config/pa/pa.opt (working copy) @@ -45,6 +45,10 @@ Target Report Mask(CALLER_COPIES) Caller copies function arguments passed by hidden reference. +mcoherent-ldcw +Target Report Var(TARGET_COHERENT_LDCW) Init(1) +Use ldcw/ldcd coherent cache-control hint. + mdisable-fpregs Target Report Mask(DISABLE_FPREGS) Disable FP regs. @@ -90,6 +94,10 @@ Target RejectNegative Report Mask(NO_SPACE_REGS) Disable space regs. +mordered +Target Report Var(TARGET_ORDERED) Init(0) +Assume memory references are ordered and barriers are not needed. + mpa-risc-1-0 Target RejectNegative Generate PA1.0 code.