Hi, I am thinking about the possibility of using SSE in kernel for speeding up the kernel memcpy particularly for copy to userspace emeory, and maybe even using the string instructions (like if we supported regex in something like eBPF). AFAIK we don't use SSE in kernel because of xmm register state needing to be saved across context switch. However, if we start busy-polling a CPU in kernel on network queues then there might not be any context switches to worry about. In this model we'd want to enable SSE per CPU.
Has this ever been tried before? Is this at all feasible? :-) Is it possible to enable SSE for kernel for just one CPU? (I found CPUID will return SSE supported, but don't see how to enable other than -msse for compiling). Thanks, Tom