Alexander Graf wrote:
>
> On Feb 25, 2008, at 6:40 PM, Avi Kivity wrote:
>
>> Alexander Graf wrote:
>>>
>>> The ebx store was done because of PIC code, which does not allow ebx
>>> to get clobbered. If we are not in PIC code, =r contains ebx as GPR
>>> though, so the assumption that ebx needs to be restored was wrong
>>> then. This new version only enables the store/restore code if i386
>>> and PIC code are used. There is no need to distinguish between
>>> x86_64 and i386 for the other cases.
>>>
>>> So does this version work?
>>>
>>
>> It probably will, but it seems fragile to depend on the details of
>> PIC. I committed something more generic:
>>
>> #ifdef __x86_64__
>> asm volatile("cpuid"
>> : "=a"(vec[0]), "=b"(vec[1]),
>> "=c"(vec[2]), "=d"(vec[3])
>> : "0"(function) : "cc");
>
> This code works fine for all targets, including i386. With PIC
> enabled, gcc registers the ebx registers and complains about this,
> thus errors out. This is the only special case I am aware of, so I
> doubt we should treat any case different from the "normal" case but
> the PIC one.
>
>>
>> #else
>> asm volatile("pusha \n\t"
>>
>> "cpuid \n\t"
>> "mov %%eax, 0(%1) \n\t"
>> "mov %%ebx, 4(%1) \n\t"
>> "mov %%ecx, 8(%1) \n\t"
>> "mov %%edx, 12(%1) \n\t"
>>
>> "popa"
>> : "a"(function), "S"(vec) : "memory", "cc");
>> #endif
>
> Basically #ifdef __x86_64__ is even wrong, as the problem is not that
> too many registers are being used, but that ebx is reserved and can't
> be saved/restored automatically.
>
> Furthermore I believe that the less assembler is used, the better the
> code looks. So for cases the snippet above is not required, why use
> it? Overusing assembler is imho exactly the reason the previous code
> broke.
>
I agree with all of this, but I think this case is an exception. gcc
doesn't behave well with many register constraints on i386 and the PIC
case shows things are not straightforward. I want something I can
forget about.
> There's one more thing I'd like to add here. Gcc optimizes really well
> when one lets it to. So for this exact case with -O2 used, there are
> no memory accesses. The vector is simply stored in 4 registers and
> thus no more movs are required.
>
Again I agree, but host_cpuid() is hardly an optimization target. you
can add a usleep(10000) there with no noticable effect.
btw the cpuid instruction execution time itself will likely overwhelm
any instructions around it (since it is microcoded).
--
Any sufficiently difficult bug is indistinguishable from a feature.
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
kvm-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/kvm-devel