From: Tom Herbert
> Sent: 05 May 2017 06:51
> To: Linux Kernel Network Developers
> Subject: SSE instructions for fast packet copy?
> 
> Hi,
> 
> I am thinking about the possibility of using SSE in kernel for
> speeding up the kernel memcpy particularly for copy to userspace
> emeory, and maybe even using the string instructions (like if we
> supported regex in something like eBPF). AFAIK we don't use SSE in
> kernel because of xmm register state needing to be saved across
> context switch. However, if we start busy-polling a CPU in kernel on
> network queues then there might not be any context switches to worry
> about. In this model we'd want to enable SSE per CPU.
> 
> Has this ever been tried before? Is this at all feasible? :-) Is it
> possible to enable SSE for kernel for just one CPU? (I found CPUID
> will return SSE supported, but don't see how to enable other than
> -msse for compiling).

Not even worth thinking about.
With recent intel cpus 'rep movsb' is optimised in the hardware
(for cached memory) and will run as fast as any other copy.

(There is a related fubar that memcopytoio() is implemented
as memcpy() and then as 'rep movsb' so generates repeated
byte accesses to io memory.)

I'm pretty sure the FP registers are 'lazy saved'.
The cpu's sse registers (the entire FP register set) might
contain life values for a process that is running on a different cpu.
If that process executes an FP instruction it will fault and an IPI
issued to get the registers written to the processes fp save area
from where they can be loaded.
Any use of the sse registers would have to interact correctly
with that IPI code.

        David

Reply via email to