>>> On Mon, Feb 12, 2007 at 4:48 AM, in message <[EMAIL PROTECTED]>, Avi Kivity <[EMAIL PROTECTED]> wrote: > Waba wrote: >> It took me a while, but I figured it out... nearly! >> >> Everything SIGILLs after the fs- root service is started. Its start >> method does several things, but the problematic bit is replacing the >> libc with an optimised version (namely, /usr/lib/libc/libc_hwcap1.so.1, >> which makes use of the SSE, MMX, CMOV, SEP and FPU instruction sets >> according to file(1)). >> >> All these flags are indeed advertised in the CPUID (isainfo - v: sse2 sse >> fxsr mmx cmov sep cx8 tsc fpu)). If the amd_sysc bit had been present, >> the hwcap2 version would have been selected by moe(1), I guess (adds >> SSE2 support and replaces SEP by AMD_SYSC). >> >> Disabling the libc replacement in /lib/svc/method/fs- root entirely >> workarounds the problem. >> >> Further investigating, I tricked ls(1) into using the optimised libc >> through LD_LIBRARY_PATH and obtained a core. mdb(1) told me that the >> culprit was hiding at libc`memset+0x74. And finally, dis(1) revealed >> that the faulty instruction is "movups (%esp), %xmm0", a SSE feature. >> The %xmm0 register is apparently for storage purposes only, as the only >> instructions used to access it are movups, movntps and movaps. >> >> At this point I hope that it makes a lot of sense to you, because I >> have no idea why it works fine on Avi's Opteron, etc. >> >> Let me know if you need any additional debugging. >> > > Let's look at the control registers at the time of the SIGILL. Can you > reproduce the error with the attached patch and send dmesg?
Hi Avi, I have a sneaking suspicion that this may be the same root-cause of my findings with #UD on SLES. I wrote a program that allows you to take MD5 sum pages of a running program's text sections and compare them. I then compared the output of GRUB running on bare-metal and as a KVM guest and they were identical (except for the expected text that is affected by relocation). This was not what I was expecting since we speculated MMU corruption. Admittedly the test is not conclusive since the page mappings could surely be different under the load of the target apps execution verses the delta program. But I was hoping for a smoking gun ;) Note that I am seeing #UD under other apps as well (Firefox for instance). If there were a disparity between the advertised and actual CPUID flags and SLES is using libraries that interpret the flags, that could explain the behavior here. Note that grub is blowing up in libc for me as well. I will explore a CPUID disparity as a possibility next and report back. What I did notice is that KVM seems to report the CPU as an AMD, even though I am running on a Woodcrest. I would speculate that the problem is that some AMD specific flag (e.g. amd_sysc) is set when it should not be. Note that I am currently being pulled off KVM work for about a week so I will be silent for a bit. -Greg ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier. Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel