On Tue, 2013-08-13 at 12:09 -0700, Richard Henderson wrote: > On 08/13/2013 11:26 AM, Jakub Jelinek wrote: > > On Fri, Aug 02, 2013 at 11:05:33AM -1000, Richard Henderson wrote: > >> On 08/02/2013 04:45 AM, Andreas Krebbel wrote: > >>> ! XCFLAGS="${XCFLAGS} -mzarch -mhtm -msoft-float" > >> > >> Not good, since _ITM_R{F,D,E,CF,CD,CE} should return values in > >> floating point registers; similarly for the write accessors. > > > > So, would it be enough to compile just beginend.cc with -msoft-float > > and the rest normally? > > No. > > >From what I understand of the s390 restriction, we can't touch the > floating point registers after starting a transaction, at least until > we're committed to restoring them all. > > Which means that we have to have everything along the path from > htm_begin_success == false until a longjmp restores them. This path > includes a call to std::operator new, so that makes it a non-starter. > > Better, I think, to pass the gtm_jmpbuf to htm_begin. Then we can do > > uint32_t ret = __builtin_tbegin_nofloat (NULL); > if (!htm_begin_success(ret)) > // restore fpu state from jb > return ret; > > at which point we're back to normal and we can do whatever we want > within the normal abi wrt the fpu.
Can we instead move the successful path of the HTM fastpath to the _ITM_beginTransaction asm function, and just do the fallback and retry code in beginend.cc? So, add a tbegin (or xbegin) and the check of serial_lock to the asm function as in Andi's patch (http://gcc.gnu.org/ml/gcc-patches/2013-01/msg00640.html), but keep the restart policy code (ie, the serial lock read/write locking for the wait, etc.) in beginend.cc? The latter could return a new code that tells the asm function to try the HTM fastpath again; we might also want to use a separate bit in the arguments to gtm_thread::begin_transaction to tell it whether the HTM fastpath failed before. This way, we get lower overhead when the HTM fastpath succeeds right away because we can do the minimally necessary thing in asm; we still can change the retry policies easily; and we can restore the floating point registers to a sane state for any libitm C/C++ code that we might call without having to do any restore inside of the C/C++ code. Thoughts? I'll put working on a draft of this for x86 on my list.