Re: kvm-72: problems with 8139 under heavy load, lost interrupts?

Farkas Levente Fri, 22 Aug 2008 13:48:27 -0700

Avi Kivity wrote:
> Nikola Ciprich wrote:
>> Hello everybody,
>> we're running cluster of two hosts with tens (~45 running) of kvms, 
>> and now I noticed that some nodes are loosing link under heavy load.
>>
>> following appears in dmesg:
>> [  422.077128] NETDEV WATCHDOG: eth0: transmit timed out
>> [  422.077215] eth0: Transmit timeout, status  d   2b    5 80ff
>>
>> [EMAIL PROTECTED] ~]# cat /proc/interrupts
>>            CPU0       CPU1       CPU2       CPU3
>>   0:        144          0          0          0   IO-APIC-edge      timer
>>   1:        539          2          1          2   IO-APIC-edge      i8042
>>   9:          0          0          0          0   IO-APIC-fasteoi   acpi
>>  10:     756783     362345     372753     751385   IO-APIC-fasteoi   eth0
>>  11:          0          0          0          0   IO-APIC-fasteoi   
>> uhci_hcd:usb1
>>  12:        150          4          3          4   IO-APIC-edge      i8042
>>  14:     518448     528815     172232     348704   IO-APIC-edge      ide0
>>  15:          0          0          0          0   IO-APIC-edge      ide1
>> NMI:          0          0          0          0   Non-maskable interrupts
>> LOC:     829179     775992     505151     458761   Local timer interrupts
>> RES:     115772      98143      88928      82099   Rescheduling interrupts
>> CAL:         73        166        138        160   function call interrupts
>> TLB:     214586     255980      66806     278284   TLB shootdowns
>> TRM:          0          0          0          0   Thermal event interrupts
>> SPU:          0          0          0          0   Spurious interrupts
>> ERR:          0
>> MIS:       1261
>>
>> I guess the MIS value might be related to this. I have observed this problem
>> on 32bit guests up to now, but it might be coincidence (those affected are 
>> heavily used).
>> It also seems that it *might* be related to SMP guests.
>>
>> Hosts are running 2.6.26.2-x86_64 + kvm-72, guests 2.6.24, and are using 
>> 8139 virt adapter.
>> I'm not sure if we had this problem with older KVM versions (and thus this 
>> is regression), 
>> as the usage of machines is growing constantly, so we maybe just didn't 
>> noticed the problem before.
>>
>> I CAN try other virt adapters as well, but both machines are production, so 
>> I have to be
>> a bit cautious when it comes to experimenting. I'll try to prepare testing 
>> environment where
>> I could reproduce the problem.
>>
>> But in the meantime, is there some way I could debug the problem furher, but 
>> in safe manner?
>> I don't see anything related in either hosts dmesg, or logfiles.
>>
>>   
> 
> What would be most useful is to verify that this reproduces reliably,
> and a recipe for us to try out.
> 
> Also, how heavy is the load?  Maybe it's so heavy that guests don't get
> scheduled and really time out.  Does the network recover if you ifdown/ifup?


the same happened with us. an easy way to reproduce was to create a new
iso image with revisor when it's use kickstart files using the given kvm
guest's nfs server.

-- 
  Levente                               "Si vis pacem para bellum!"
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kvm-72: problems with 8139 under heavy load, lost interrupts?

Reply via email to