* r...@redhat.com <r...@redhat.com> wrote:
> These patches defer FPU state loading until return to userspace.
> This has the advantage of not clobbering the FPU state of one task
> with that of another, when that other task only stays in kernel mode.
> It also allows us to skip the FPU restore in kernel_fpu_end(), which
> will help tasks that do multiple invokations of kernel_fpu_begin/end
> without returning to userspace, for example KVM VCPU tasks.
> We could also skip the restore of the KVM VCPU guest FPU state at
> guest entry time, if it is still valid, but I have not implemented
> that yet.
> The code that loads FPU context directly into registers from user
> space memory, or saves directly to user space memory, is wrapped
> in a retry loop, that ensures the FPU state is correctly set up
> at the start, and verifies that it is still valid at the end.
> I have stress tested these patches with various FPU test programs,
> and things seem to survive.
> However, I have not found any good test suites that mix FPU
> use and signal handlers. Close scrutiny of these patches would
> be appreciated.
BTW., for the next version it would be nice to also have a benchmark that shows
the advantages (and proves that it's not causing measurable overhead elsewhere).
Either an FPU-aware extension to 'perf bench sched' or a separate 'perf bench
suite would be nice.