On May 13, 2014, at 9:50 AM, John Nielsen <li...@jnielsen.net> wrote:

> On May 9, 2014, at 12:41 PM, John Nielsen <li...@jnielsen.net> wrote:
> 
>> On May 8, 2014, at 12:42 PM, Andrew Duane <adu...@juniper.net> wrote:
>> 
>>> From: owner-freebsd-hack...@freebsd.org 
>>> [mailto:owner-freebsd-hack...@freebsd.org] On Behalf Of John Nielsen
>>> 
>>>> On May 8, 2014, at 11:03 AM, John Baldwin <j...@freebsd.org> wrote:
>>>> 
>>>>> On Wednesday, May 07, 2014 7:15:43 pm John Nielsen wrote:
>>>>>> I am trying to solve a problem with amd64 FreeBSD virtual machines 
>>>>>> running on a Linux+KVM hypervisor. To be honest I'm not sure if the 
>>>>>> problem is in FreeBSD or 
>>>>> the hypervisor, but I'm trying to rule out the OS first.
>>>>>> 
>>>>>> The _second_ time FreeBSD boots in a virtual machine with more than one 
>>>>>> core, the boot hangs just before the kernel would normally print e.g. 
>>>>>> "SMP: AP CPU #1 
>>>>> Launched!" (The last line on the console is "usbus0: 12Mbps Full Speed 
>>>>> USB v1.0", but the problem persists even without USB). The VM will boot 
>>>>> fine a first time, 
>>>>> but running either "shutdown -r now" OR "reboot" will lead to a hung 
>>>>> second boot. Stopping and starting the host qemu-kvm process is the only 
>>>>> way to continue.
>>>>>> 
>>>>>> The problem seems to be triggered by something in the SMP portion of 
>>>>>> cpu_reset() (from sys/amd64/amd64/vm_machdep.c). If I hit the virtual 
>>>>>> "reset" button the next 
>>>>> boot is fine. If I have 'kern.smp.disabled="1"' set for the initial boot 
>>>>> then subsequent boots are fine (but I can only use one CPU core, of 
>>>>> course). However, if I 
>>>>> boot normally the first time then set 'kern.smp.disabled="1"' for the 
>>>>> second (re)boot, the problem is triggered. Apparently something in the 
>>>>> shutdown code is 
>>>>> "poisoning the well" for the next boot.
>>>>>> 
>>>>>> The problem is present in FreeBSD 8.4, 9.2, 10.0 and 11-CURRENT as of 
>>>>>> yesterday.
>>>>>> 
>>>>>> This (heavy-handed and wrong) patch (to HEAD) lets me avoid the issue:
>>>>>> 
>>>>>> --- sys/amd64/amd64/vm_machdep.c.orig    2014-05-07 13:19:07.400981580 
>>>>>> -0600
>>>>>> +++ sys/amd64/amd64/vm_machdep.c 2014-05-07 17:02:52.416783795 -0600
>>>>>> @@ -593,7 +593,7 @@
>>>>>> void
>>>>>> cpu_reset()
>>>>>> {
>>>>>> -#ifdef SMP
>>>>>> +#if 0
>>>>>>  cpuset_t map;
>>>>>>  u_int cnt;
>>>>>> 
>>>>>> I've tried skipping or disabling smaller chunks of code within the #if 
>>>>>> block but haven't found a consistent winner yet.
>>>>>> 
>>>>>> I'm hoping the list will have suggestions on how I can further narrow 
>>>>>> down the problem, or theories on what might be going on.
>>>>> 
>>>>> Can you try forcing the reboot to occur on the BSP (via 'cpuset -l 0 
>>>>> reboot')
>>>>> or a non-BSP ('cpuset -l 1 reboot') to see if that has any effect?  It 
>>>>> might
>>>>> not, but if it does it would help narrow down the code to consider.
>>>> 
>>>> Hello jhb, thanks for responding.
>>>> 
>>>> I tried your suggestion but unfortunately it does not make any difference. 
>>>> The reboot hangs regardless of which CPU I assign the command to.
>>>> 
>>>> Any other suggestions?
>>> 
>>> When I was doing some early work on some of the Octeon multi-core chips, I 
>>> encountered something similar. If I remember correctly, there was an issue 
>>> in the shutdown sequence that did not properly halt the cores and set up 
>>> the "start jump" vector. So the first core would start, and when it tried 
>>> to start the next ones it would hang waiting for the ACK that they were 
>>> running (since they didn't have a start vector and hence never started). I 
>>> know MIPS, not AMD, so I can't say what the equivalent would be, but I'm 
>>> sure there is one. Check that part, setting up the early state.
>>> 
>>> If Juli and/or Adrian are reading this: do you remember anything about 
>>> that, something like 2 years ago?
>> 
>> That does sound promising, would love more details if anyone can provide 
>> them.
>> 
>> Here's another wrinkle:
>> 
>> The KVM machine in question is part of a cluster of identical servers 
>> (hardware, OS, software revisions). The problem is present on all servers in 
>> the cluster.
>> 
>> I also have access to a second homogenous cluster. The OS and software 
>> revisions on this cluster are identical to the first. The hardware is 
>> _nearly_ identical--slightly different mainboards from the same manufacturer 
>> and slightly older CPUs. The same VMs (identical disk image and definition, 
>> including CPU flags passed to the guest) that have a problem on the first 
>> cluster work flawlessly on this one.
>> 
>> Not sure if that means the bad behavior only appears on certain CPUs or if 
>> it's timing-related or something else entirely. I'd welcome speculation at 
>> this point.
>> 
>> CPU details below in case it makes a difference.
>> 
>> == Problem Host ==
>> model name      : Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
>> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
>> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx 
>> pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology 
>> nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx 
>> est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt 
>> tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt 
>> pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms
>> 
>> == Good Host ==
>> model name      : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
>> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
>> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx 
>> pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology 
>> nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx 
>> est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt 
>> tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts 
>> dtherm tpr_shadow vnmi flexpriority ept vpid
> 
> Still haven't found a solution but I did learn something else interesting: an 
> ACPI reboot allows the system to come back up successfully. What is different 
> from the system or CPU point of view about an ACPI reboot versus running 
> "reboot" or "shutdown" from userland?

Following up on the off chance anyone else is interested. I installed -HEAD on 
a host that was having the problem ("v2" Xeon CPU) and ran a FreeBSD 9 VM under 
bhyve. The problem did _not_ persist. That's not entirely conclusive but it 
does point the finger at Qemu a bit more strongly. I have filed a bug with them:
  https://bugs.launchpad.net/qemu/+bug/1329956

Still, if anyone has any ideas on what could be going on I'd love to hear them.

JN

_______________________________________________
freebsd-virtualization@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"

Reply via email to