On Tue, Jul 19, 2011 at 10:04 AM, Avi Kivity <[email protected]> wrote:
> On 07/19/2011 12:51 AM, Stefan Hajnoczi wrote:
>>
>> On Mon, Jul 18, 2011 at 9:18 PM, Berg, Johannes<[email protected]>
>> wrote:
>> >> Today I encountered a "BUG: scheduling while atomic" from kvm.ko when
>> >> resuming the host from suspend-to-RAM. I captured as much of the oops
>> >> as
>> >> was displayed on screen:
>> >>
>> >> http://vmsplice.net/~stefan/panic1.jpg
>> >> http://vmsplice.net/~stefan/panic2.jpg
>> >>
>> >> It looks like the iwlagn driver may have crashed in an interrupt
>> >> handler and the
>> >> kvm-related panic was triggered in the aftermath. Any ideas?
>> >
>> > This doesn't look like iwlagn is involved at all -- the fact that it
>> > comes up in the backtrace seems to be an artifact of backtracing not being
>> > perfect. The RIP points to kvm_arch_vcpu_ioctl_run+0x927 and there's no
>> > reason to believe that iwlagn should crash kvm.
>>
>> RIP seems to be arch/x86/kvm/x86.c:vcpu_enter_guest():
>>
>> preempt_disable();
>>
>> kvm_x86_ops->prepare_guest_switch(vcpu);
>> if (vcpu->fpu_active)
>> kvm_load_guest_fpu(vcpu);
>> kvm_load_guest_xcr0(vcpu);
>>
>> vcpu->mode = IN_GUEST_MODE;
>>
>> /* We should set ->mode before check ->requests,
>> * see the comment in make_all_cpus_request.
>> */
>> smp_mb();
>>
>> local_irq_disable();
>>
>> if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
>> || need_resched() || signal_pending(current)) {
>> vcpu->mode = OUTSIDE_GUEST_MODE;
>> smp_wmb();
>> local_irq_enable();
>> preempt_enable();
>> kvm_x86_ops->cancel_injection(vcpu);
>> r = 1;
>> goto out;
>> }
>>
>> srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
>>
>> kvm_guest_enter();
>>
>> if (unlikely(vcpu->arch.switch_db_regs)) {
>> set_debugreg(0, 7);
>> set_debugreg(vcpu->arch.eff_db[0], 0);
>> set_debugreg(vcpu->arch.eff_db[1], 1);
>> set_debugreg(vcpu->arch.eff_db[2], 2);
>> set_debugreg(vcpu->arch.eff_db[3], 3);
>> }
>>
>> trace_kvm_entry(vcpu->vcpu_id);
>> kvm_x86_ops->run(vcpu);
>>
>> /*
>> * If the guest has used debug registers, at least dr7
>> * will be disabled while returning to the host.
>> * If we don't have active breakpoints in the host, we don't
>> * care about the messed up debug address registers. But if
>> * we have some of them active, restore the old state.
>> */
>> if (hw_breakpoint_active())
>> hw_breakpoint_restore();
>>
>> kvm_get_msr(vcpu, MSR_IA32_TSC,&vcpu->arch.last_guest_tsc);
>>
>> vcpu->mode = OUTSIDE_GUEST_MODE;
>> smp_wmb();
>> local_irq_enable(); /*<--- boom! */
>
> Preemption is still disabled at this point. Where does the "scheduling
> while atomic" come from? Nothing in this area attempts to schedule.
>
> The preemption counter is 0x10000100, indicating zero preempt depth (wrong
> for this point, should be 1), and 1 softirq depth (doesn't make much sense).
> Looks very wrong, like the preempt mixup that occured on some archs that
> are not x86_64 recently.
>
> Can you post some disassembly around %rip?
Here is the kvm.ko disassembly for %rip = kvm_arch_vcpu_ioctl_run+0x927.
I think at runtime the pvops bit should be patched and we should just
have a sti instruction?
I don't understand the backtrace. It prints out the interrupt handler
stack frames but %rip appears to be down in kvm.ko. If there is an
interrupt handler running shouldn't %rip be in there?
Unfortunately the full oops scrolled off the screen so I only have
panic1.jpg and panic2.jpg. We're missing earlier information. Is it
possible that there was another panic first and that kvm.ko is just
dying in the aftermath?
11436: 44 8b 6b 24 mov 0x24(%rbx),%r13d
1143a: 74 23 je 1145f
<kvm_arch_vcpu_ioctl_run+0x8dd>
1143c: 4c 8b 25 00 00 00 00 mov 0x0(%rip),%r12 #
11443 <kvm_arch_vcpu_ioctl_run+0x8c1>
1143f: R_X86_64_PC32 __tracepoint_kvm_entry+0x1c
11443: 4d 85 e4 test %r12,%r12
11446: 74 17 je 1145f
<kvm_arch_vcpu_ioctl_run+0x8dd>
11448: 49 8b 7c 24 08 mov 0x8(%r12),%rdi
1144d: 44 89 ee mov %r13d,%esi
11450: 41 ff 14 24 callq *(%r12)
11454: 49 83 c4 10 add $0x10,%r12
11458: 49 83 3c 24 00 cmpq $0x0,(%r12)
1145d: eb e7 jmp 11446
<kvm_arch_vcpu_ioctl_run+0x8c4>
1145f: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax #
11466 <kvm_arch_vcpu_ioctl_run+0x8e4>
11462: R_X86_64_PC32 kvm_x86_ops+0xfffffffffffffffc
11466: 48 89 df mov %rbx,%rdi
11469: ff 90 48 01 00 00 callq *0x148(%rax)
1146f: 65 48 8b 04 25 00 00 mov %gs:0x0,%rax
11476: 00 00
11474: R_X86_64_32S cpu_dr7
11478: a8 aa test $0xaa,%al
1147a: 74 05 je 11481
<kvm_arch_vcpu_ioctl_run+0x8ff>
1147c: e8 00 00 00 00 callq 11481
<kvm_arch_vcpu_ioctl_run+0x8ff>
1147d: R_X86_64_PC32
hw_breakpoint_restore+0xfffffffffffffffc
11481: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax #
11488 <kvm_arch_vcpu_ioctl_run+0x906>
11484: R_X86_64_PC32 kvm_x86_ops+0xfffffffffffffffc
11488: 48 8b 54 24 50 mov 0x50(%rsp),%rdx
1148d: be 10 00 00 00 mov $0x10,%esi
11492: 48 89 df mov %rbx,%rdi
11495: ff 90 80 00 00 00 callq *0x80(%rax)
1149b: c7 43 2c 00 00 00 00 movl $0x0,0x2c(%rbx)
114a2: ff 14 25 00 00 00 00 callq *0x0
114a5: R_X86_64_32S pv_irq_ops+0x18
114a9: ff 83 b8 00 00 00 incl 0xb8(%rbx) <--- boom!
114af: 48 8b 44 24 40 mov 0x40(%rsp),%rax
114b4: 83 60 14 ef andl $0xffffffef,0x14(%rax)
114b8: 48 8b 3b mov (%rbx),%rdi
114bb: 48 83 c7 38 add $0x38,%rdi
114bf: e8 00 00 00 00 callq 114c4
<kvm_arch_vcpu_ioctl_run+0x942>
114c0: R_X86_64_PC32
__srcu_read_lock+0xfffffffffffffffc
114c4: 89 43 28 mov %eax,0x28(%rbx)
114c7: 83 3d 00 00 00 00 04 cmpl $0x4,0x0(%rip) #
114ce <kvm_arch_vcpu_ioctl_run+0x94c>
114c9: R_X86_64_PC32 prof_on+0xfffffffffffffffb
114ce: 75 28 jne 114f8
<kvm_arch_vcpu_ioctl_run+0x976>
114d0: be 10 00 00 00 mov $0x10,%esi
114d5: 48 89 df mov %rbx,%rdi
114d8: e8 46 bb ff ff callq d023 <kvm_register_read>
114dd: 83 3d 00 00 00 00 04 cmpl $0x4,0x0(%rip) #
114e4 <kvm_arch_vcpu_ioctl_run+0x962>
114df: R_X86_64_PC32 prof_on+0xfffffffffffffffb
114e4: 75 12 jne 114f8
<kvm_arch_vcpu_ioctl_run+0x976>
For comparison here are the disassembled "Code" bytes from the panic
message. I think the photo truncates the full output so the last
instruction is incomplete, but it matches up with 0x11488 above:
0: 99 cltd
1: b2 e0 mov $0xe0,%dl
3: 48 8b 05 f8 62 02 00 mov 0x262f8(%rip),%rax # 0x26302
a: 48 8b 54 24 50 mov 0x50(%rsp),%rdx
f: be 10 00 00 00 mov $0x10,%esi
14: 48 89 df mov %rbx,%rdi
17: ff .byte 0xff
18: 90 nop
19: 80 .byte 0x80
Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html