On 06/30/2016 10:36 AM, Andy Lutomirski wrote: >>> We make baseline_pkru a process-wide baseline and store it in >>> mm->context. That way, no matter which thread gets interrupted for a >>> signal, they see consistent values. We only write to it when an app >>> _specifically_ asks for it to be updated with a special flag to >>> sys_pkey_set(). >>> >>> When an app uses the execute-only support, we implicitly set the >>> read-disable bit in baseline_pkru for the execute-only pkey. ... > Looking at your git tree, which I assume is a reasonably approximation > of your current patches, this seems to be unimplemented. I, at least, > would be nervous about using PKRU for protection of critical data if > signal handlers are unconditionally exempt.
I actually went along and implemented this using an extra 'flag' for pkey_get/set(). I just left it out of this stage since I'm having enough problems getting it in with the existing set of features. :) I'm confident we can add this later with the flags we can pass to pkey_get() and pkey_set(). > Also, the lazily allocated no-read key for execute-only is done in the > name of performance, but it results in odd semantics. How much of a > performance win is preserving the init optimization of PKRU in > practice? (I.e. how much faster are XSAVE and XRSTOR?) I can't test > because even my Skylake laptop doesn't have PKRU. This is admittedly not the most realistic benchmark because everything is cache-warm, but I ran Ingo's FPU "measure.c" code on XSAVES/XRSTORS. This runs things in pretty tight loops where everything is cache hot. The XSAVE instructions are monsters and I'm not super-confident in my measurements, but I'm seeing in the neighborhood of XSAVES/XRSTORS getting 20-30 cycles when PKRU is in play vs. not. This is with completely cache-hot data, though.

