>>> On Mon, Feb 12, 2007 at  4:48 AM, in message <[EMAIL PROTECTED]>,
Avi Kivity <[EMAIL PROTECTED]> wrote: 
> Waba wrote:
>> It took me a while, but I figured it out... nearly!
>>
>> Everything SIGILLs after the fs- root service is started. Its start
>> method does several things, but the problematic bit is replacing the
>> libc with an optimised version (namely, /usr/lib/libc/libc_hwcap1.so.1,
>> which makes use of the SSE, MMX, CMOV, SEP and FPU instruction sets
>> according to file(1)).
>>
>> All these flags are indeed advertised in the CPUID (isainfo - v: sse2 sse
>> fxsr mmx cmov sep cx8 tsc fpu)). If the amd_sysc bit had been present,
>> the hwcap2 version would have been selected by moe(1), I guess (adds
>> SSE2 support and replaces SEP by AMD_SYSC).
>>
>> Disabling the libc replacement in /lib/svc/method/fs- root entirely
>> workarounds the problem.
>>
>> Further investigating, I tricked ls(1) into using the optimised libc
>> through LD_LIBRARY_PATH and obtained a core. mdb(1) told me that the
>> culprit was hiding at libc`memset+0x74. And finally, dis(1) revealed
>> that the faulty instruction is "movups (%esp), %xmm0", a SSE feature.
>> The %xmm0 register is apparently for storage purposes only, as the only
>> instructions used to access it are movups, movntps and movaps.
>>
>> At this point I hope that it makes a lot of sense to you, because I
>> have no idea why it works fine on Avi's Opteron, etc.
>>
>> Let me know if you need any additional debugging.
>>   
> 
> Let's look at the control registers at the time of the SIGILL.  Can you 
> reproduce the error with the attached patch and send dmesg?


Hi Avi,
  I have a sneaking suspicion that this may be the same root-cause of my 
findings with #UD on SLES.  I wrote a program that allows you to take MD5 sum 
pages of a running program's text sections and compare them.  I then compared 
the output of GRUB running on bare-metal and as a KVM guest and they were 
identical (except for the expected text that is affected by relocation).  This 
was not what I was expecting since we speculated MMU corruption.  Admittedly 
the test is not conclusive since the page mappings could surely be different 
under the load of the target apps execution verses the delta program.  But I 
was hoping for a smoking gun ;)

Note that I am seeing #UD under other apps as well (Firefox for instance).  If 
there were a disparity between the advertised and actual CPUID flags and SLES 
is using libraries that interpret the flags, that could explain the behavior 
here.  Note that grub is blowing up in libc for me as well.  I will explore a 
CPUID disparity as a possibility next and report back.   What I did notice is 
that KVM seems to report the CPU as an AMD, even though I am running on a 
Woodcrest.  I would speculate that the problem is that some AMD specific flag 
(e.g. amd_sysc) is set when it should not be.

Note that I am currently being pulled off KVM work for about a week so I will 
be silent for a bit.

-Greg

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Reply via email to