Hi Ben, Am Sonntag, den 08.08.2010, 03:36 +0100 schrieb Ben Hutchings: > This is not the same bug as was originally reported, which is that > virtio_net failed to retry refilling its RX buffer ring. That is > definitely fixed. So I'm treating this as a new bug report, #592187.
Okay, thanks. > > > I think you need to give your guests more memory. > > > > They all have between 512M and 2G - and it happens to all of them using > > virtio_net, and none of them using rtl8139 as a network driver, > > reproducibly. > > The RTL8139 hardware uses a single fixed RX DMA buffer. The virtio > 'hardware' allows the host to write into RX buffers anywhere in guest > memory. This results in very different allocation patterns. > > Please try specifying 'e1000' hardware, i.e. an Intel gigabit > controller. I think the e1000 driver will have a similar allocation > pattern to virtio_net, so you can see whether it also triggers > allocation failures and a network stall in the guest. > > Also, please test Linux 2.6.35 in the guest. This is packaged in the > 'experimental' suite. I'll rig up a test machine (the crashes all occured on production guests, unfortunatly) and report back. > [...] > > If it would be an OOM situation, wouldn't the OOM-killer be supposed to > > kick in? > [...] > > The log you sent shows failure to allocate memory in an 'atomic' context > where there is no opportunity to wait for pages to be swapped out. The > OOM killer isn't triggered until the system is running out of memory > despite swapping out pages. Ah, good to know, thanks! > Also, I note that following the failure of virtio_net to refill its RX > buffer ring, I see failures to allocate buffers for sending TCP ACKs. > So the guest drops the ACKs, and that TCP connection will stall > temporarily (until the peer re-sends the unacknowledged packets). > > I also see 'nfs: server fileserver.backup.TechFak.Uni-Bielefeld.DE not > responding, still trying'. This suggests that the allocation failure in > virtio_net has resulted in dropping packets from the NFS server. And it > just makes matters worse as it becomes impossible to free memory by > flushing out buffers over NFS! This sounds quite bad. This problem *seems* to be fixed by 2.6.32-19: we upgraded to that on a different machine for host and guests, and an rsync of ~1TiB of data didn't produce any page allocation failures using virtio. But I'd wait for my tests with rsync/nfs and 2.6.32-18+e1000, 2.6.32-18+virtio 2.6.32-19+virtio and 2.6.35+virtio to conclude that. Thanks for taking your time to explain things! -- Lukas -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org