I saw your post about XGETBV (http://robert.ocallahan.org/2017/06/another-case-of-obscure-cpu.html), and it sounds like it could plausibly be a kernel bug. What kernel are you on?
I wonder if CPUs have an optimization in which, if a given register set is in the init state but XINUSE=1, then they'll notice when XRSTORS runs and clear XINUSE. If so, that would be rather unfortunate IMO. Dave, why is XINUSE exposed at all to userspace? IIRC it's visible via XGETBV, XSAVEC, and maybe even XSAVE, and ISTM that making it visible to userspace serves basically no performance purpose and just encourages annoying corner cases. I can see an argument that making XSAVEC fast is nice for user threading libraries, but user threading libraries that use XSAVEC are probably doing it wrong and should instead rely on ABI guarantees to avoid saving and restoring extended state at all. To be fair, glibc uses this new XGETBV feature, but I suspect its usage is rather dubious. Shouldn't it just do XSAVEC directly rather than rolling its own code?

