These patches defer FPU state loading until return to userspace.
This has the advantage of not clobbering the FPU state of one task
with that of another, when that other task only stays in kernel mode.
It also allows us to skip the FPU restore in kernel_fpu_end(), which
will help tasks that do multiple invokations of kernel_fpu_begin/end
without returning to userspace, for example KVM VCPU tasks.
We could also skip the restore of the KVM VCPU guest FPU state at
guest entry time, if it is still valid, but I have not implemented
The code that loads FPU context directly into registers from user
space memory, or saves directly to user space memory, is wrapped
in a retry loop, that ensures the FPU state is correctly set up
at the start, and verifies that it is still valid at the end.
I have stress tested these patches with various FPU test programs,
and things seem to survive.
However, I have not found any good test suites that mix FPU
use and signal handlers. Close scrutiny of these patches would