+       /* Copy over the stack-frame */
+       cld
+       rep movsb

Ugh. This is going to be horrendous. Maybe not noticeable on modern
CPU's, but the whole 32-bit code is kind of pointless on a modern CPU.

At least use "rep movsl". If the kernel stack isn't 4-byte aligned,
you have issues.

Indeed, "rep movs" has some setup overhead that makes it undesirable
for small sizes. In my testing, moving less than 128 bytes with "rep movs"
is a loss.

