Avi Kivity wrote: > Nikola Ciprich wrote: >> Hello everybody, >> we're running cluster of two hosts with tens (~45 running) of kvms, >> and now I noticed that some nodes are loosing link under heavy load. >> >> following appears in dmesg: >> [ 422.077128] NETDEV WATCHDOG: eth0: transmit timed out >> [ 422.077215] eth0: Transmit timeout, status d 2b 5 80ff >> >> [EMAIL PROTECTED] ~]# cat /proc/interrupts >> CPU0 CPU1 CPU2 CPU3 >> 0: 144 0 0 0 IO-APIC-edge timer >> 1: 539 2 1 2 IO-APIC-edge i8042 >> 9: 0 0 0 0 IO-APIC-fasteoi acpi >> 10: 756783 362345 372753 751385 IO-APIC-fasteoi eth0 >> 11: 0 0 0 0 IO-APIC-fasteoi >> uhci_hcd:usb1 >> 12: 150 4 3 4 IO-APIC-edge i8042 >> 14: 518448 528815 172232 348704 IO-APIC-edge ide0 >> 15: 0 0 0 0 IO-APIC-edge ide1 >> NMI: 0 0 0 0 Non-maskable interrupts >> LOC: 829179 775992 505151 458761 Local timer interrupts >> RES: 115772 98143 88928 82099 Rescheduling interrupts >> CAL: 73 166 138 160 function call interrupts >> TLB: 214586 255980 66806 278284 TLB shootdowns >> TRM: 0 0 0 0 Thermal event interrupts >> SPU: 0 0 0 0 Spurious interrupts >> ERR: 0 >> MIS: 1261 >> >> I guess the MIS value might be related to this. I have observed this problem >> on 32bit guests up to now, but it might be coincidence (those affected are >> heavily used). >> It also seems that it *might* be related to SMP guests. >> >> Hosts are running 2.6.26.2-x86_64 + kvm-72, guests 2.6.24, and are using >> 8139 virt adapter. >> I'm not sure if we had this problem with older KVM versions (and thus this >> is regression), >> as the usage of machines is growing constantly, so we maybe just didn't >> noticed the problem before. >> >> I CAN try other virt adapters as well, but both machines are production, so >> I have to be >> a bit cautious when it comes to experimenting. I'll try to prepare testing >> environment where >> I could reproduce the problem. >> >> But in the meantime, is there some way I could debug the problem furher, but >> in safe manner? >> I don't see anything related in either hosts dmesg, or logfiles. >> >> > > What would be most useful is to verify that this reproduces reliably, > and a recipe for us to try out. > > Also, how heavy is the load? Maybe it's so heavy that guests don't get > scheduled and really time out. Does the network recover if you ifdown/ifup?
the same happened with us. an easy way to reproduce was to create a new iso image with revisor when it's use kickstart files using the given kvm guest's nfs server. -- Levente "Si vis pacem para bellum!" -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
