It looks like I am getting kernel bug on 64-bit Xen Debian in similar
conditions, ie, when running drbd-verify.
I have got it happening on both cluster nodes.
Kernel 2.6.26-2-xen-amd64, DRBD 8.3.5 compiled from Debian unstable
package for 8.3.4
For anyone interested, here is the stack trace.
BR,
Ivars
Nov 16 03:00:29 ariel kernel: [31375.026193] BUG: unable to handle
kernel NULL pointer dereference at 0000000000000016
Nov 16 03:00:29 ariel kernel: [31375.026288] IP: [<ffffffffa02f9169>]
:drbd:drbd_connector_callback+0x32/0x181
Nov 16 03:00:29 ariel kernel: [31375.026359] PGD 164c4067 PUD 170d1067
PMD 0
Nov 16 03:00:29 ariel kernel: [31375.026423] Oops: 0000 [1] SMP
Nov 16 03:00:29 ariel kernel: [31375.026474] CPU 0
Nov 16 03:00:29 ariel kernel: [31375.026512] Modules linked in:
xt_physdev iptable_filter ip_tables x_tables sha1_generic dr
bd cn iscsi_trgt crc32c libcrc32c ipv6 bridge xfs w83627ehf lm85
hwmon_vid netconsole configfs xenblktap netloop softdog ipm
i_watchdog ipmi_msghandler loop psmouse serio_raw pcspkr i2c_i801
i2c_core button rng_core shpchp pci_hotplug intel_agp evde
v ext3 jbd mbcache dm_mirror dm_log dm_snapshot dm_mod ide_cd_mod cdrom
ide_disk ide_pci_generic ata_piix piix ide_core ata_
generic libata scsi_mod dock skge ehci_hcd uhci_hcd thermal processor
fan thermal_sys [last unloaded: scsi_wait_scan]
Nov 16 03:00:29 ariel kernel: [31375.027370] Pid: 3165, comm: cqueue Not
tainted 2.6.26-2-xen-amd64 #1
Nov 16 03:00:29 ariel kernel: [31375.027405] RIP:
e030:[<ffffffffa02f9169>] [<ffffffffa02f9169>] :drbd:drbd_connector_callb
ack+0x32/0x181
Nov 16 03:00:29 ariel kernel: [31375.027485] RSP: e02b:ffff8800104f3e50
EFLAGS: 00010206
Nov 16 03:00:29 ariel kernel: [31375.027519] RAX: 0000000000000000 RBX:
ffff88001648c220 RCX: 0000000000000000
Nov 16 03:00:29 ariel kernel: [31375.027555] RDX: 0000000000000000 RSI:
0000000000000000 RDI: ffff8800164c9c10
Nov 16 03:00:29 ariel kernel: [31375.027597] RBP: ffff88001648c1d8 R08:
ffff8800104f2000 R09: ffffffff80553e18
Nov 16 03:00:29 ariel kernel: [31375.027633] R10: 0000000000000000 R11:
7fffffffffffffff R12: ffff8800164c9c10
Nov 16 03:00:29 ariel kernel: [31375.027669] R13: ffffffffa02d30c3 R14:
ffffffff8057d1c0 R15: 0000000000000000
Nov 16 03:00:29 ariel kernel: [31375.027709] FS: 00007f9ee13c46e0(0000)
GS:ffffffff8053a000(0000) knlGS:0000000000000000
Nov 16 03:00:29 ariel kernel: [31375.027761] CS: e033 DS: 0000 ES: 0000
Nov 16 03:00:29 ariel kernel: [31375.027793] DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000
Nov 16 03:00:29 ariel kernel: [31375.027829] DR3: 0000000000000000 DR6:
00000000ffff0ff0 DR7: 0000000000000400
Nov 16 03:00:29 ariel kernel: [31375.027866] Process cqueue (pid: 3165,
threadinfo ffff8800104f2000, task ffff8800161e1440)
Nov 16 03:00:29 ariel kernel: [31375.027918] Stack: 0000000000000000
ffff88001648c220 ffff88001648c1d8 ffff88001648c1d0
Nov 16 03:00:29 ariel kernel: [31375.028024] ffffffffa02d30c3
ffffffff8057d1c0 0000000000000000 ffffffffa02d30d8
Nov 16 03:00:29 ariel kernel: [31375.028120] 7fffffffffffffff
ffff880016f76840 ffff88001648c1d0 ffffffff8023c34c
Nov 16 03:00:29 ariel kernel: [31375.028185] Call Trace:
Nov 16 03:00:29 ariel kernel: [31375.028250] [<ffffffffa02d30c3>] ?
:cn:cn_queue_wrapper+0x0/0x33
Nov 16 03:00:29 ariel kernel: [31375.028393] [<ffffffffa02d30d8>] ?
:cn:cn_queue_wrapper+0x15/0x33
Nov 16 03:00:29 ariel kernel: [31375.028439] [<ffffffff8023c34c>] ?
run_workqueue+0xbe/0x189
Nov 16 03:00:29 ariel kernel: [31375.028482] [<ffffffff8023cd35>] ?
worker_thread+0xd5/0xe0
Nov 16 03:00:29 ariel kernel: [31375.028522] [<ffffffff8023f6c1>] ?
autoremove_wake_function+0x0/0x2e
Nov 16 03:00:29 ariel kernel: [31375.028564] [<ffffffff8023cc60>] ?
worker_thread+0x0/0xe0
Nov 16 03:00:29 ariel kernel: [31375.028601] [<ffffffff8023f593>] ?
kthread+0x47/0x74
Nov 16 03:00:29 ariel kernel: [31375.028637] [<ffffffff802283a8>] ?
schedule_tail+0x27/0x5c
Nov 16 03:00:29 ariel kernel: [31375.028677] [<ffffffff8020be28>] ?
child_rip+0xa/0x12
Nov 16 03:00:29 ariel kernel: [31375.028722] [<ffffffff8023f54c>] ?
kthread+0x0/0x74
Nov 16 03:00:29 ariel kernel: [31375.028760] [<ffffffff8020be1e>] ?
child_rip+0x0/0x12
Nov 16 03:00:29 ariel kernel: [31375.028796]
Nov 16 03:00:29 ariel kernel: [31375.028824]
Nov 16 03:00:29 ariel kernel: [31375.028852] Code: 41 55 41 54 49 89 fc
55 53 48 83 ec 08 65 8b 04 25 24 00 00 00 83 3d a6 75 01 00 02 74 1e 89
c0 48 c1 e0 07 48 ff 80 00 09 31 a0 <f6> 42 16 20 be 98 00 00 00 0f 84
20 01 00 00 eb 1a 41 5b 5b 5d
Nov 16 03:00:29 ariel kernel: [31375.029581] RIP [<ffffffffa02f9169>]
:drbd:drbd_connector_callback+0x32/0x181
Nov 16 03:00:29 ariel kernel: [31375.029657] RSP <ffff8800104f3e50>
Nov 16 03:00:29 ariel kernel: [31375.029688] CR2: 0000000000000016
Nov 16 03:00:29 ariel kernel: [31375.030762] ---[ end trace
296f6157c8798c56 ]---
Jean-Francois Chevrette wrote:
It appears that there is currently a problem with the latest
CentOS/Redhat kernel. We have noticed the same problem when using LVM
snapshots and a backup technology called R1Soft CDP.
Some related info:
http://bugs.centos.org/view.php?id=3869
forum.r1soft.com/showthread.php?t=1158
No sign of a bug at bugzilla.redhat.com
For now we have reverted to kernel-2.6.18-128.7.1 on which we did not
have any issues for the past 4 hours. Previously, a few seconds after
starting a 'drbdadm verify' the kernel panic would occur.
DRBD devs might want to check it out.
Regards,
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user