On Fri, 2017-06-30 at 13:41 -0300, Breno Leitao wrote: > Thanks Gustavo for the patch. > > On Thu, Jun 29, 2017 at 08:39:23PM -0400, Gustavo Romero wrote: > > Currently tm_reclaim() can return with a corrupted vs0 (fp0) or vs32 (v0) > > due to the fact vs0 is used to save FPSCR and vs32 is used to save VSCR. > > > > Later, we recheckpoint trusting that the live state of FP and VEC are ok > > depending on the MSR.FP and MSR.VEC bits, i.e. if MSR.FP is enabled that > > means the FP registers checkpointed before entering are correct and so > > after a treclaim. we can trust the FP live state, and similarly on VEC regs. > > However if tm_reclaim() does not return a sane state then tm_recheckpoint() > > will recheckpoint a corrupted instate back to the checkpoint area. > > > > That commit fixes the corruption by restoring vs0 and vs32 from the > > ckfp_state and ckvr_state after they are used to save FPSCR and VSCR, > > respectively. > > > > The effect of the issue described above is observed, for instance, once a > > VSX unavailable exception is caught in the middle of a transaction with > > MSR.FP = 1 or MSR.VEC = 1. If MSR.FP = 1, then after getting back to user > > space FP state is corrupted. If MSR.VEC = 1, then VEC state is corrupted. > > > > The issue does occur if MSR.FP = 0 and MSR.VEC = 0 because ckfp_state and > > ckvr_state are both copied from fp_state and vr_state, respectively, and > > on recheckpointing both states will be restored from these thread > > structures and not from the live state. > > Just complementing what Gustavo said, currently the tm_recheckpoint() > function grabs the values to be re-checkpoint from two places, depending > on the MSR configuration. > > If VEC or FP are disabled on MSR argument when tm_recheckpoint() is > called, then it restorese the values from the thread ckvr/ckfp area and > recheckpoint these values into processor transactional area. > > On the other side, if VEC or FP are disabled, it does not restore the > vectors and fp registers from the thread ckvec/ckfp area, and it will > recheckpoint what is currently on the live registers. If the registers > changed after the reclaim, we will send these new values recheckpointed, > and later on userspace when the transaction fails. > > This second scenario is what is causing the error reported on this > email thread. In fact, I am wondering if we can hit another hidden bug that > changes the fp and vec values between the tm_reclaim() and > tm_recheckpoint() invokations, as for example, an interrupt that calls > memcpy() in this small mean time. > > That said, I am wondering if we shouldn't always rely on thread ckfp and > ckvec during tm_recheckpoint() (and never rely on the live registers). This > should simplify the reclaim/recheckpoint mechanism, and make it less > error prone.
I think you're absolutely correct - its a mess. Personally I think that the assembly functions should do the bare minimum. So the two cases that you describe which are handled in assembly could easily be handled in C either before the call to the assembly or after. Cyril