The system seems to call CPUID at startup and for every multiversioned function, patch an offset in its dispatcher function. The dispatcher function is then nothing more than a jump realtive to RIP, e.g.:
jmp QWORD PTR [rip+0x200bf2] This is as efficient as it gets short of using whole-program optimization. -- Marco