amd64 booting fine on bare metal, but not as dom0 with Xen 4.0.1 (Dell R410))

Cris Daniluk Tue, 23 Nov 2010 04:48:21 -0800

I was unable to, and this does look similar indeed. I tried a variety of
pvops kernels and kernel configs and was unable to get past this. I never
found resolution and eventually fell back to 3.4.3 w/a xenlinux kernel. Much
less sexy but very stable on the same hardware.


I also had related but different problems on IBM 3650 M2s and IBM 3500s with
pvops kernels. It seems very prone to crashing at any APIC/ACPI bugs, of
which there seem to be quite a bit of in both Dell and IBM. I was toying
with the idea of downgrading BIOS's based on the success someone else on
xen-devel list reported with that, but I didn't have the time to see that
idea through.

On Tue, Nov 23, 2010 at 6:51 AM, Ian Campbell <[email protected]> wrote:

> Thanks for the report Vincent.
>
> I've added xen-devel to the CC as well as Cris Daniluk who previously
> reported a very similar issue[0] also on an R410 -- Cris did you ever
> get a resolution to your issue?
>
> Vincent's full report is at:
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=603632
> I've also attached the boot log here of which the interesting part looks
> to be:
>
>        [    8.422639] xen: acpi sci 9
>        [    8.434217] Console: colour VGA+ 80x25
>        [    8.441350] console [hvc0] enabled, bootconsole disabled
>        [    8.441350] console [hvc0] enabled, bootconsole disabled
>        [    8.462694] Xen: using vcpuop timer interface
>        [    8.471508] installing Xen timer for CPU 0
>        [    8.479841] BUG: unable to handle kernel paging request at
> 0000000000005a08
>        [    8.493868] IP: [<ffffffff810badce>]
> __alloc_pages_nodemask+0x8f/0x5f5
>        [    8.507041] PGD 0
>        [    8.511199] Thread overran stack, or stack corrupted
>        [    8.521253] Oops: 0000 [#1] SMP
>        [    8.527838] last sysfs file:
>        [    8.533941] CPU 0
>        [    8.538100] Modules linked in:
>        [    8.544342] Pid: 0, comm: swapper Not tainted 2.6.32-5-xen-amd64
> #1 PowerEdge R410
>        [    8.559594] RIP: e030:[<ffffffff810badce>]  [<ffffffff810badce>]
> __alloc_pages_nodemask+0x8f/0x5f5
>        [    8.577620] RSP: e02b:ffffffff81443c88  EFLAGS: 00010046
>        [    8.588366] RAX: 0000000000000000 RBX: 0000000000005220 RCX:
> 0000000000005a00
>        [    8.602752] RDX: 0000000000000000 RSI: 0000000000000002 RDI:
> 0000000000005220
>        [    8.617139] RBP: 0000000000004020 R08: 0000000000000002 R09:
> ffff88003fc1c010
>        [    8.631525] R10: ffffffff813c2700 R11: 00000000000186a0 R12:
> 0000000000005220
>        [    8.645910] R13: 0000000000000002 R14: 0000000000000000 R15:
> ffff88000000da28
>        [    8.660300] FS:  0000000000000000(0000) GS:ffff88000349b000(0000)
> knlGS:0000000000000000
>        [    8.676591] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
>        [    8.688203] CR2: 0000000000005a08 CR3: 0000000001001000 CR4:
> 0000000000002660
>        [    8.702589] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
>        [    8.716975] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
>        [    8.731361] Process swapper (pid: 0, threadinfo ffffffff81442000,
> task ffffffff814771f0)
>        [    8.747654] Stack:
>        [    8.751813]  ffff88000000da00 00000010813c2765 00000000000212d0
> 00000000000186a0
>        [    8.766199] <0> ffff88000000ac10 ffffffff8100e5b5
> ffffffff8100ec72 00000000000186a0
>        [    8.781625] <0> 00000000000186a0 0000000000000000
> 0000000000005a00 0000000000000000
>        [    8.797572] Call Trace:
>        [    8.802603]  [<ffffffff8100e5b5>] ?
> xen_force_evtchn_callback+0x9/0xa
>        [    8.815600]  [<ffffffff8100ec72>] ? check_events+0x12/0x20
>        [    8.826695]  [<ffffffff810e759d>] ? new_slab+0x42/0x1ca
>        [    8.837267]  [<ffffffff810e7915>] ? __slab_alloc+0x1f0/0x39b
>        [    8.848707]  [<ffffffff812f87d8>] ?
> irq_to_desc_alloc_node+0x96/0x195
>        [    8.861704]  [<ffffffff810e85cb>] ? __kmalloc_node+0xe8/0x146
>        [    8.873317]  [<ffffffff812f87d8>] ?
> irq_to_desc_alloc_node+0x96/0x195
>        [    8.886316]  [<ffffffff812f87d8>] ?
> irq_to_desc_alloc_node+0x96/0x195
>        [    8.899317]  [<ffffffff811f24df>] ? find_unbound_irq+0x67/0xae
>        [    8.911103]  [<ffffffff811f259e>] ? bind_virq_to_irq+0x78/0x126
>        [    8.923062]  [<ffffffff8100e5b5>] ?
> xen_force_evtchn_callback+0x9/0xa
>        [    8.936063]  [<ffffffff8100e8f6>] ? xen_timer_interrupt+0x0/0x18d
>        [    8.948368]  [<ffffffff811f29f6>] ?
> bind_virq_to_irqhandler+0x19/0x4a
>        [    8.961368]  [<ffffffff8100e884>] ? xen_setup_timer+0x55/0xaa
>        [    8.972982]  [<ffffffff81509a5e>] ? xen_time_init+0xaf/0xb5
>        [    8.984247]  [<ffffffff8150a491>] ? x86_late_time_init+0xa/0x10
>        [    8.996206]  [<ffffffff81506c3d>] ? start_kernel+0x348/0x3e8
>        [    9.007646]  [<ffffffff81508c7d>] ? xen_start_kernel+0x57c/0x581
>        [    9.019777] Code: d8 c1 e8 13 83 e0 01 09 44 24 64 41 89 dc 44 23
> 25 28 01 43 00 44 89 e2 83 e2 10 89 54 24 5c 74 05 e8 16 03 25 00 48 8b 4c
> 24 50 <48> 83 79 08 00 0f 84 30 05 00 00 83 e3 0f 48 8b 44 24 50 41 bf
>        [    9.057561] RIP  [<ffffffff810badce>]
> __alloc_pages_nodemask+0x8f/0x5f5
>        [    9.070909]  RSP <ffffffff81443c88>
>        [    9.078015] CR2: 0000000000005a08
>        [    9.084780] ---[ end trace a7919e7f17c0a725 ]---
>        [    9.094136] Kernel panic - not syncing: Attempted to kill the
> idle task!
>
> It's worth noting that the Debian kernels are based on
> e73f4955a821f850f5b88c32d12a81714523a95f (less the GPU fixes merged by
> bcf16b6b4f34fb40a7aaf637947c7d3bce0be671, which the Debian kernel
> maintainer chose to exclude).
>
> The baseline is slightly old but Debian is now pretty deeply frozen so a
> wholesale rebase is not possible, if either of you have run a more
> recent kernel the result would be interesting to know.
>
> The actual crashing RIP corresponds to mm/page_alloc.c:1975 which is in
> __alloc_pages_nodemask:
>
>        /*
>         * Check the zones suitable for the gfp_mask contain at least one
>         * valid zone. It's possible to have an empty zonelist as a result
>         * of GFP_THISNODE and a memoryless node
>         */
>        if (unlikely(!zonelist->_zonerefs->zone))
>                return NULL;
>
> zonelist->_zonerefs is an array but looking at the disassembly and the
> register dump zonelist itself appears to be 0x5a00 which seems unlikely
> to be valid.
>
> The zonelist ultimately comes from node which is always passed as 0 in
> the outer most caller in this stack trace (find_unbound_irq calling
> irq_to_desc_alloc_node).
>
> I'm not sure but looking at the complete bootlog it looks as if the
> system may only have node==1 i.e. no 0 node which could plausibly lead
> to this sort of issue:
>        [    0.000000] Bootmem setup node 1
> 0000000000000000-0000000040000000
>        [    0.000000]   NODE_DATA [0000000000008000 - 000000000000ffff]
>        [    0.000000]   bootmap [0000000000010000 -  0000000000017fff]
> pages 8
>        [    0.000000] (8 early reservations) ==> bootmem [0000000000 -
> 0040000000]
>        [    0.000000]   #0 [0000000000 - 0000001000]   BIOS data page ==>
> [0000000000 - 0000001000]
>        [    0.000000]   #1 [0003446000 - 0003465000]   XEN PAGETABLES ==>
> [0003446000 - 0003465000]
>        [    0.000000]   #2 [0000006000 - 0000008000]       TRAMPOLINE ==>
> [0000006000 - 0000008000]
>        [    0.000000]   #3 [0001000000 - 0001694994]    TEXT DATA BSS ==>
> [0001000000 - 0001694994]
>        [    0.000000]   #4 [00016b5000 - 0003244e00]          RAMDISK ==>
> [00016b5000 - 0003244e00]
>        [    0.000000]   #5 [0003245000 - 0003446000]   XEN START INFO ==>
> [0003245000 - 0003446000]
>        [    0.000000]   #6 [0001695000 - 000169532d]              BRK ==>
> [0001695000 - 000169532d]
>        [    0.000000]   #7 [0000100000 - 00002e0000]          PGTABLE ==>
> [0000100000 - 00002e0000]
>        [    0.000000] found SMP MP-table at [ffff8800000fe710] fe710
>        [    0.000000] Zone PFN ranges:
>        [    0.000000]   DMA      0x00000000 -> 0x00001000
>        [    0.000000]   DMA32    0x00001000 -> 0x00100000
>        [    0.000000]   Normal   0x00100000 -> 0x00100000
>        [    0.000000] Movable zone start PFN for each node
>        [    0.000000] early_node_map[2] active PFN ranges
>        [    0.000000]     1: 0x00000000 -> 0x000000a0
>        [    0.000000]     1: 0x00000100 -> 0x00040000
>        [    0.000000] On node 1 totalpages: 262048
>        [    0.000000]   DMA zone: 56 pages used for memmap
>        [    0.000000]   DMA zone: 483 pages reserved
>        [    0.000000]   DMA zone: 3461 pages, LIFO batch:0
>        [    0.000000]   DMA32 zone: 3528 pages used for memmap
>        [    0.000000]   DMA32 zone: 254520 pages, LIFO batch:31
>
> Perhaps we should be passing numa_node_id() (e.g. current node) instead
> of node 0? There doesn't seem to be another obvious alternative to
> passing in an explicit node number to this callchain (some places cope
> with -1 but not this path AFAICT).
>
> It's also not obvious if dom0 should be seeing the tables which describe
> the hosts nodes anyway or if we should be clobbering something. Given
> that dom0 sees a pseudo-physical address map I'm not convinced seeing
> the real SRAT is in any way beneficial. Perhaps we should simply be
> clobbering NUMAness until actual PV understanding of NUMA is ready?
>
> One thing I notice when googling R410 issues is that they apparently
> have a "Cores per CPU" BIOS option which might be worth playing with,
> since configuring a reduced number of cores might remove node 0 but not
> node 1 (odd but not invalid?). Presumably it is also worth making sure
> you have the latest BIOS etc.
>
> It's very much an outside possibility but it is also worth trying the
> packages at http://xenbits.xen.org/people/ianc/ which reinstates the
> changesets from bcf16b6b4f34fb40a7aaf637947c7d3bce0be671
>
> Ian.
>
> [0]
> http://lists.xensource.com/archives/html/xen-devel/2010-06/msg01140.html
>
> On Tue, 2010-11-16 at 00:32 +0100, Vincent CARON wrote:
> > Package: linux-image-2.6.32-5-xen-amd64
> > Version: 2.6.32-27
> > Severity: important
> >
> > I just tried d-i 6beta1 and booted Squeeeze and its 2.6.32 kernel for
> > the first time on my usual server hardware (Dell R410).
> >
> > I opted for the xen-amd64 kernel, and it boots fine on bare metal. But
> > as soon as I tried to boot it as dom0 over Xen hypervisor, it BUG's:
> >
> > [    8.479841] BUG: unable to handle kernel paging request at
> > 0000000000005a08^M
> > [    8.493868] IP: [<ffffffff810badce>]
> > __alloc_pages_nodemask+0x8f/0x5f5^M
> >
> >   Then quickly oopses and panics. I tried various flags:
> >   - upping dom0_mem from 256M to 1024M (I've been running Lenny/Xen 3.2
> >     with 256M happily for several months on the same hw)
> >   - using Xen 'nommu'
> >   - using Linux nomodeset
> >
> >   Then I followed instructions on a Xen wiki page to provide verbose
> >   traces (although they do not look much more verbose than the regular
> >   boot).
> >
> >   I'm using an IPMI serial-over-lan console which appears as a regular
> >   UART to Xen.
> >
> >   I'm attaching a boot log to this report.
> >
> > -- System Information:
> > Debian Release: squeeze/sid
> >   APT prefers testing
> >   APT policy: (500, 'testing')
> > Architecture: amd64 (x86_64)
> >
> > Kernel: Linux 2.6.32-5-amd64 (SMP w/2 CPU cores)
> > Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
> > Shell: /bin/sh linked to /bin/bash
> >
> >
> >
>
> --
> Ian Campbell
> Current Noise: Wolf - Seize The Night
>
> If you will practice being fictional for a while, you will understand that
> fictional characters are sometimes more real than people with bodies and
> heartbeats.
>

Bug#603632: PVops domain 0 crash on NUMA system only Node==1 present (Was: Re: Bug#603632: linux-image-2.6.32-5-xen-amd64: Linux kernel 2.6.32/xen/amd64 booting fine on bare metal, but not as dom0 with Xen 4.0.1 (Dell R410))

Reply via email to