On Sat, 10 Apr 2010 01:48:24 +0100, Ben Hutchings <[email protected]> wrote: > On Thu, 2010-04-08 at 12:41 -0400, micah anderson wrote: > > On 2010-04-08, micah anderson wrote: > > > On Wed, 2010-04-07 at 11:52 -0400, Micah Anderson wrote: > > > > Package: linux-image-2.6.32-2-amd64 > > > > Version: 2.6.32-8~bpo50+1 > > > > Severity: important > > > > > > > > I'm running a tor exit node on a kvm instance, it runs for a little > > > > while (between an hour and 3 days), doing 30-40mbit/sec and then > > > > suddenly 'swapper: page allocation failure' happens, and the entire > > > > networking stack of the kvm instance is dead. It stops responding on > > > > the net completely. No ping in or out, no traffic can be observed > > > > using tcpdump, the counters on the interface no longer change > > > > (although the interface stays up). > > > [...] > > > > > > It sounds like there might be a memory leak. Please send the contents > > > of /proc/meminfo and /proc/slabinfo from a 'normal' state and the broken > > > state. > > > > I noticed this time when it crashed something different that I had not > > seen in previous 2.6.30/2.6.26 kernels: > > > > [ 7962.841287] SLUB: Unable to allocate memory on node -1 (gfp=0x20) > > [ 7962.841287] cache: kmalloc-1024, object size: 1024, buffer size: 1024, > > default order: 1, min order: 0 > > [ 7962.841287] node 0: slabs: 606, objs: 4544, free: 0 > > > > and then the normal: > > [ 7963.102476] swapper: page allocation failure. order:0, mode:0x4020 > > [ 7963.105743] Pid: 0, comm: swapper Not tainted 2.6.32-bpo.2-amd64 #1 > > [ 7963.106418] Call Trace: > > [ 7963.106418] <IRQ> [<ffffffff810b947d>] ? > > __alloc_pages_nodemask+0x55b/0x5ce > > etc. > > > > As requested here is a normal state /proc/meminfo and /proc/slabinfo. See > > below for > > the broken state > [...] > > There's no sign of a memory leak and there's actually much more free > memory in the broken state, perhaps because any network servers have > lost all their clients and freed session state. My guess is that the > driver just doesn't handle allocation failure gracefully. Which network > driver are you using in the guest?
I started with virtio, but had a hunch that maybe switching to e100e
might be more stable, but sadly both produce the same results.
Here is the domain.xml:
<domain type='kvm'>
<name>wagtail.example.net</name>
<uuid>cfdd8232-be2f-4ac5-9cbd-dbc6f6956d77</uuid>
<memory>524 288</memory>
<currentMemory>524 288</currentMemory>
<vcpu>1</vcpu>
<os>
<type arch='x86_64' machine='pc'>hvm</type>
<boot dev='hd'/>
<boot dev='cdrom'/>
</os>
<features>
<acpi/>
</features>
<clock offset='utc'/>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>restart</on_crash>
<devices>
<emulator>/usr/bin/kvm</emulator>
<disk type='file' device='cdrom'>
<source file='/root/grub-rescue/grub-rescue.iso'/>
<target dev='hdc' bus='ide'/>
<readonly/>
</disk>
<disk type='block' device='disk'>
<source dev='/dev/disk/by-id/dm-name-khyber-micah_wagtail.example.net'/>
<target dev='vda' bus='virtio'/>
</disk>
<interface type='ethernet'>
<mac address='52:54:00:43:ae:3d'/>
<target dev='wagtail'/>
<script path='/bin/true'/>
<model type='e100e'/>
</interface>
<serial type='pty'>
<target port='0'/>
</serial>
<serial type='unix'>
<source mode='bind' path='/home/micah/wagtail.example.net/ttyS1'/>
<target port='1'/>
</serial>
<console type='pty'>
<target port='0'/>
</console>
<graphics type='vnc' autoport='true'/>
</devices>
</domain>
pgprGti0PXun0.pgp
Description: PGP signature

