On Tue, Feb 11, 2014 at 6:53 AM, Dave Anderson <[email protected]> wrote:
> > > ----- Original Message ----- > > Dave Anderson reached out and wrote: > > > > ----- Original Message ----- > > > [root kvm7 127.0.0.1-2014-02-07-19:17:09]# crash > > > /boot/System.map-2.6.32-220.el6.x86_64.debug > > > /usr/lib/debug/lib/modules/2.6.32-220.el6.x86_64.debug/vmlinux vmcore > > > > > > crash 5.1.8-1.el6 > > > Copyright (C) 2002-2011 Red Hat, Inc. > > > Copyright (C) 2004, 2005, 2006 IBM Corporation > > > Copyright (C) 1999-2006 Hewlett-Packard Co > > > Copyright (C) 2005, 2006 Fujitsu Limited > > > Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. > > > Copyright (C) 2005 NEC Corporation > > > Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. > > > Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. > > > This program is free software, covered by the GNU General Public > License, > > > and you are welcome to change it and/or distribute copies of it under > > > certain conditions. Enter "help copying" to see the conditions. > > > This program has absolutely no warranty. Enter "help warranty" for > details. > > > GNU gdb (GDB) 7.0 > > > Copyright (C) 2009 Free Software Foundation, Inc. > > > License GPLv3+: GNU GPL version 3 or later < > > > http://gnu.org/licenses/gpl.html > > > > > > > This is free software: you are free to change and redistribute it. > > > There is NO WARRANTY, to the extent permitted by law. Type "show > copying" > > > and "show warranty" for details. > > > This GDB was configured as "x86_64-unknown-linux-gnu"... > > > > > > crash: page excluded: kernel virtual address: ffffffff81542000 type: > "cpu_possible_mask" > > > > > > I can go into minimal, > > > > > > > > > nm -Bn /usr/lib/debug/lib/modules/2.6.32-220.el6.x86_64.debug/vmlinux | > > > grep _stext > > > ffffffff81000198 T _stext > > > > > > cat /proc/kallsyms | grep _stext > > > ffffffff81000198 T _stext > > > > > > If I use the System Map parm I get this warning > > > > > > WARNING: kernels compiled by different gcc versions: > > > /usr/lib/debug/lib/modules/2.6.32-220.el6.x86_64.debug/vmlinux: 4.4.5 > > > vmcore kernel: 4.4.6 > > > > > > > > > Would really like to understand why this system crashed. I know I'm a > bit > > > behind on my kernel versions however, but I should be able to look at > this > > > kernel?? > > > > > > Thanks > > > Tory > > > > It looks like the vmcore and vmlinux file don't match, like maybe the > crashing > > system was running the standard 2.6.32-220.el6.x86_64 kernel, and you're > trying > > to debug it using the 2.6.32-220.el6.x86_64.debug kernel variant? > > > > First thing -- *never* use a System.map file unless for some reason you > don't > > have the original kernel's vmlinux available *and* you feel that the > vmlinux > > file you have is very close to the crashing kernel's vmlinux. Bit with > any > > RHEL standard (unmodified) vmlinux/vmcore setup, the System.map is > completely > > useless. > > > > So the first question is: what kernel generated the vmcore? > > > > Do this: > > > > $ strings vmcore | grep '2.6.32' > > Dave > > > > > > -- > > Dave you are right, I thought I had to use the devel kernel and in fact > my > > system is not running that, so it crashed with the standard > 2.6.32-220.el6.x86_64 kernel. > > > > [tblue@kvm7 127.0.0.1-2014-02-07-19:17:09]$ sudo strings vmcore | grep > '2.6.32' > > > > 2.6.32-220.el6.x86_64 > > OSRELEASE=2.6.32-220.el6.x86_64 > > > > But it won't take my vmlinux from /boot > > > > crash: /boot/vmlinuz-2.6.32-220.el6.x86_64: not a supported file format > > There is no vmlinux file in /boot. The "vmlinuz" (with-a-z) file is not > usable. > You will always need the vmlinux from from the associated kernel-debuginfo > rpm. > > > Yes sir you were correct, I was using the wrong kernel! > > > > please wait... (determining panic task) > > WARNING: multiple active tasks have called die > > > > KERNEL: /usr/lib/debug/lib/modules/2.6.32-220.el6.x86_64/vmlinux > > DUMPFILE: /libvirt/crash/127.0.0.1-2014-02-07-19:17:09/vmcore [PARTIAL > DUMP] > > CPUS: 32 > > DATE: Fri Feb 7 18:16:05 2014 > > UPTIME: 226 days, 21:36:13 > > LOAD AVERAGE: 2.42, 2.68, 2.69 > > TASKS: 816 > > NODENAME: kvm7.domain.com > > RELEASE: 2.6.32-220.el6.x86_64 > > VERSION: #1 SMP Tue Dec 6 19:48:22 GMT 2011 > > MACHINE: x86_64 (2200 Mhz) > > MEMORY: 88 GB > > PANIC: "" > > PID: 0 > > COMMAND: "swapper" > > TASK: ffff881665514b40 (1 of 32) [THREAD_INFO: ffff880c6124e000] > > CPU: 19 > > STATE: TASK_RUNNING (PANIC) > > > > Nothing stands out as s bug or reason to fail > > > > divide error: 0000 [#1] SMP > > last sysfs file: > /sys/devices/system/cpu/cpu31/cache/index2/shared_cpu_map > > CPU 19 > > Modules linked in: ext3 jbd ip6table_filter ip6_tables ebtable_nat > ebtables > > ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 > xt_state > > nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter > ip_tables > > sunrpc bridge stp llc bonding ipv6 vhost_net macvtap macvlan tun > kvm_intel > > kvm cdc_ether usbnet mii microcode i2c_i801 i2c_core iTCO_wdt > > iTCO_vendor_support shpchp igb ioatdma dca ses enclosure sg ext4 mbcache > > jbd2 sr_mod cdrom sd_mod crc_t10dif ahci megaraid_sas dm_mirror > > dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] > > > > Pid: 0, comm: swapper Not tainted 2.6.32-220.el6.x86_64 #1 IBM System > x3650 > > M4 -[7915AC1]-/00J6528 > > RIP: 0010:[<ffffffff81054ad5>] [<ffffffff81054ad5>] > find_busiest_group+0x5c5/0xb20 > > RSP: 0018:ffff880028363c40 EFLAGS: 00010246 > > RAX: 0000000000000000 RBX: ffff880028363e64 RCX: 0000000000000000 > > RDX: 0000000000000000 RSI: ffff8800282cf540 RDI: ffff8800282d5fc0 > > RBP: ffff880028363dd0 R08: ffff8800282cf860 R09: 0000000000000000 > > R10: 0000000000000000 R11: 0000000000000001 R12: 00000000ffffff01 > > R13: 0000000000015fc0 R14: ffffffffffffffff R15: 0000000000000000 > > FS: 0000000000000000(0000) GS:ffff880028360000(0000) > knlGS:0000000000000000 > > CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > > CR2: 00007f4e5215c000 CR3: 00000011bea54000 CR4: 00000000000426e0 > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > > Process swapper (pid: 0, threadinfo ffff880c6124e000, task > ffff881665514b40) > > Stack: > > ffff880028363d70 ffff880028363ce0 ffff880028363ca0 000000000000024d > > <0> ffff8800282cf860 ffff880028363e58 0101881664b121a8 0000000600000000 > > <0> 0000000600000000 ffff8800282cf540 0000000123386cc0 0000000000000008 > > Call Trace: > > <IRQ> > > [<ffffffffa02e4669>] ? br_handle_frame_finish+0x179/0x2a0 [bridge] > > [<ffffffff8105fc52>] rebalance_domains+0x1a2/0x5b0 > > [<ffffffff81060153>] run_rebalance_domains+0xf3/0x160 > > [<ffffffff8107c4f0>] ? get_next_timer_interrupt+0x1b0/0x250 > > [<ffffffff81072161>] __do_softirq+0xc1/0x1d0 > > [<ffffffff81097e0a>] ? sched_clock_idle_wakeup_event+0x1a/0x20 > > [<ffffffff8100c24c>] call_softirq+0x1c/0x30 > > [<ffffffff8100de85>] do_softirq+0x65/0xa0 > > [<ffffffff81071f45>] irq_exit+0x85/0x90 > > [<ffffffff8102a255>] smp_call_function_single_interrupt+0x35/0x40 > > [<ffffffff8100bdb3>] call_function_single_interrupt+0x13/0x20 > > <EOI> > > [<ffffffff812c4a5e>] ? intel_idle+0xde/0x170 > > [<ffffffff812c4a41>] ? intel_idle+0xc1/0x170 > > [<ffffffff813f9f47>] cpuidle_idle_call+0xa7/0x140 > > [<ffffffff81009e06>] cpu_idle+0xb6/0x110 > > [<ffffffff814e5f23>] start_secondary+0x202/0x245 > > Code: d0 b8 01 00 00 00 48 c1 ea 0a 48 85 d2 0f 45 c2 41 89 40 08 66 90 > 4c 8b > > 85 e0 fe ff ff 48 8b 45 a8 31 d2 41 8b 48 08 48 c1 e0 0a <48> f7 f1 48 > 8b 4d > > b0 48 89 45 a0 31 c0 48 85 c9 74 0c 48 8b 45 > > RIP [<ffffffff81054ad5>] find_busiest_group+0x5c5/0xb20 > > RSP <ffff880028363c40> > > > > Is there a forum that would help me figure out what exactly cause this > crash > > as it's not the first time, across this series of servers running KVM > > > > Thank you sir, > > > > Tory > > >From the information above, there was a divide-by-zero fault in > find_busiest_group(). > If you ran the "bt" command on the panic task, it might be a little more > obvious, > but the "divide error: 0000 [#1] SMP" string comes from the divide_error() > function. > > Anyway, you are running 2.6.32-220.el6, and from a more recent kernel.spec > changelog, > this issue was fixed in 2.6.32-248.el6: > > * Tue Mar 06 2012 Aristeu Rozanski <[email protected]> [2.6.32-248.el6] > - [netdrv] bnx2: revert firmware load modifications (Neil Horman) [720428] > - [virt] virtio: balloon: leak / fill balloon across S4 (Amit Shah) > [798583] > - [scsi] silencing 'killing requests for dead queue' (David Milburn) > [798672] > - [scsi] sd_dif: fix setting bio flags (Jeff Moyer) [799075] > - [scsi] megaraid_sas: driver update to version 00.00.06.14-rh1 (Tomas > Henzl) [749923] > - [infiniband] srp: fix include ordering issue (Doug Ledford) [791209] > - [sched] Fix Kernel divide by zero panic in find_busiest_group() (Larry > Woodman) [785959] > > Time to upgrade... > > Dave > > > You are again absolutely right, I appreciate the time and the assistance. Scheduling the upgrades! Thanks! Tory
-- Crash-utility mailing list [email protected] https://www.redhat.com/mailman/listinfo/crash-utility
