Hi,

i'm hitting a bug in drbd, with latest CentOs and drbd 8.3.12 using GFS2 on top with cman and rgmanager.

Here is the simplest method to have it occur.
1. Start drbd on node s2
2. Start drbd on node s3
They sync up:
[root@s3 ~]# cat /proc/drbd
version: 8.3.12 (api:88/proto:86-96)
GIT-hash: e2a8ef4656be026bbae540305fcb998a5991090f build by dag@Build64R6, 2011-11-20 10:57:03
 0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
ns:0 nr:45060 dw:45056 dr:660 al:0 bm:11 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
3. Start cman on s2 & s3, so i can use gfs2: cluster is up OK:
[root@s3 ~]# cman_tool status
Version: 6.2.0
Config Version: 8
Cluster Name: stor
Cluster Id: 61164
Cluster Member: Yes
Cluster Generation: 140
Membership state: Cluster-Member
Nodes: 2
Expected votes: 1
Total votes: 2
Node votes: 1
Quorum: 1
Active subsystems: 7
Flags: 2node
Ports Bound: 0
Node name: s3alt.c.XX.si
Node ID: 3
Multicast addresses: 239.192.238.219 239.192.0.2
Node addresses: 192.168.168.3 10.31.0.42
4. Start gfs2 on both nodes:
Mar 16 10:29:41 s3 kernel: GFS2 (built Mar  7 2012 00:54:51) installed
Mar 16 10:29:41 s3 kernel: GFS2: fsid=: Trying to join cluster "lock_dlm", "stor:drbdstor"
Mar 16 10:29:41 s3 kernel: dlm: Using SCTP for communications
Mar 16 10:29:41 s3 kernel: SCTP: Hash tables configured (established 65536 bind 65536)
Mar 16 10:29:41 s3 kernel: dlm: connecting to 2 sctp association 1
Mar 16 10:29:41 s3 kernel: GFS2: fsid=stor:drbdstor.1: Joined cluster. Now mounting FS... Mar 16 10:29:42 s3 kernel: GFS2: fsid=stor:drbdstor.1: jid=1, already locked for use Mar 16 10:29:42 s3 kernel: GFS2: fsid=stor:drbdstor.1: jid=1: Looking at journal...
Mar 16 10:29:42 s3 kernel: GFS2: fsid=stor:drbdstor.1: jid=1: Done
5. Stop gfs on s3 (didn't write anything to s2 or s3 on drbd mount while mounted)
6. Stop drbd on s3:
Mar 16 10:32:02 s3 kernel: block drbd0: role( Primary -> Secondary )
Mar 16 10:32:02 s3 kernel: block drbd0: bitmap WRITE of 0 pages took 0 jiffies Mar 16 10:32:02 s3 kernel: block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map. Mar 16 10:32:02 s3 kernel: block drbd0: Requested state change failed by peer: Refusing to be Primary while peer is not outdated (-7) Mar 16 10:32:02 s3 kernel: block drbd0: peer( Primary -> Unknown ) conn( Connected -> Disconnecting ) disk( UpToDate -> Outdated ) pdsk( UpToDate -> DUnknown )
Mar 16 10:32:02 s3 kernel: block drbd0: asender terminated
Mar 16 10:32:02 s3 kernel: block drbd0: Terminating asender thread
Mar 16 10:32:02 s3 kernel: block drbd0: Connection closed
Mar 16 10:32:02 s3 kernel: block drbd0: conn( Disconnecting -> StandAlone )
Mar 16 10:32:02 s3 kernel: block drbd0: receiver terminated
Mar 16 10:32:02 s3 kernel: block drbd0: Terminating receiver thread
Mar 16 10:32:02 s3 kernel: block drbd0: disk( Outdated -> Failed )
Mar 16 10:32:02 s3 kernel: block drbd0: disk( Failed -> Diskless )
Mar 16 10:32:02 s3 kernel: block drbd0: drbd_bm_resize called with capacity == 0
Mar 16 10:32:02 s3 kernel: block drbd0: worker terminated
Mar 16 10:32:02 s3 kernel: block drbd0: Terminating worker thread
Mar 16 10:32:02 s3 kernel: drbd: module cleanup done
7. Start drbd on s3 ->
KABUM: s3 kernel panicked:
Message from syslogd@s3 at Mar 16 10:32:45 ...
 kernel:Oops: 0000 [#1] SMP
Message from syslogd@s3 at Mar 16 10:32:45 ...
 kernel:last sysfs file: /sys/devices/virtual/block/drbd0/removable
Message from syslogd@s3 at Mar 16 10:32:45 ...
 kernel:Stack:
Message from syslogd@s3 at Mar 16 10:32:45 ...
 kernel:Call Trace:
Message from syslogd@s3 at Mar 16 10:32:45 ...
kernel:Code: 83 ec 20 48 89 1c 24 4c 89 64 24 08 4c 89 6c 24 10 4c 89 74 24 18 0f 1f 44 00 00 4c 8b b7 a0 00 00 00 89 f3 49 89 d4 49 8b 46 38 <4c> 8b 68 38 8d 83 10 76 ff ff 83 f8 0f 76 51 8d 83 00 75 ff ff
Message from syslogd@s3 at Mar 16 10:32:45 ...
 kernel:CR2: 0000000000000038
Message from syslogd@s3 at Mar 16 10:32:45 ...
 kernel:Kernel panic - not syncing: Fatal exception

s2 cman then fences off hung s3.
Should i provide more info?

Here is one of the more detailed errors i managed to get while testing:
Mar 16 09:39:50 s2 kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
Mar 16 09:39:50 s2 kernel: IP: [<ffffffff814185a0>] sock_ioctl+0x30/0x280
Mar 16 09:39:50 s2 kernel: PGD 238460067 PUD 229f59067 PMD 0
Mar 16 09:39:50 s2 kernel: Oops: 0000 [#1] SMP
Message from syslogd@s2 at Mar 16 09:39:50 ...
 kernel:Oops: 0000 [#1] SMP
Mar 16 09:39:50 s2 kernel: last sysfs file: /sys/module/drbd/parameters/cn_idx
Message from syslogd@s2 at Mar 16 09:39:50 ...
 kernel:last sysfs file: /sys/module/drbd/parameters/cn_idx
Mar 16 09:39:50 s2 kernel: CPU 0
Mar 16 09:39:50 s2 kernel: Modules linked in: gfs2 drbd(U) sctp libcrc32c dlm configfs sunrpc 8021q garp stp llc bonding ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ext2 raid0 serio_raw i2c_i801 i2c_core sg iTCO_wdt iTCO_vendor_support ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif ahci igb dca e1000e dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: drbd]
Mar 16 09:39:50 s2 kernel:
Mar 16 09:39:50 s2 kernel: Pid: 4875, comm: drbdadm Tainted: G W ---------------- 2.6.32-220.7.1.el6.x86_64 #1 Supermicro X9SCL/X9SCM/X9SCL/X9SCM Mar 16 09:39:50 s2 kernel: RIP: 0010:[<ffffffff814185a0>] [<ffffffff814185a0>] sock_ioctl+0x30/0x280
Mar 16 09:39:50 s2 kernel: RSP: 0018:ffff880229bf7e38  EFLAGS: 00010282
Mar 16 09:39:50 s2 kernel: RAX: 0000000000000000 RBX: 0000000000005401 RCX: 00007fff226c2180 Mar 16 09:39:50 s2 kernel: RDX: 00007fff226c2180 RSI: 0000000000005401 RDI: ffff880233a91980 Mar 16 09:39:50 s2 kernel: RBP: ffff880229bf7e58 R08: ffffffff8165fa40 R09: 00007fbd4cb1c940 Mar 16 09:39:50 s2 kernel: R10: 00007fff226c1f90 R11: 0000000000000206 R12: 00007fff226c2180 Mar 16 09:39:50 s2 kernel: R13: 00007fff226c2180 R14: ffff88023ab51200 R15: 0000000000000000 Mar 16 09:39:50 s2 kernel: FS: 00007fbd4cd25700(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
Mar 16 09:39:50 s2 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 16 09:39:50 s2 kernel: CR2: 0000000000000038 CR3: 0000000238d69000 CR4: 00000000000406f0 Mar 16 09:39:50 s2 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Mar 16 09:39:50 s2 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Mar 16 09:39:50 s2 kernel: Process drbdadm (pid: 4875, threadinfo ffff880229bf6000, task ffff8802299c3580)
Mar 16 09:39:50 s2 kernel: Stack:
Message from syslogd@s2 at Mar 16 09:39:50 ...
 kernel:Stack:
Mar 16 09:39:50 s2 kernel: ffff880233a91980 ffff88023ab51248 00007fff226c2180 0000000000000000 Mar 16 09:39:50 s2 kernel: <0> ffff880229bf7e98 ffffffff811892f2 ffff880229bf7e98 ffffffff814f253e Mar 16 09:39:50 s2 kernel: <0> 0000000000000001 0000000000000003 0000000000627760 ffff880233a91980
Mar 16 09:39:50 s2 kernel: Call Trace:
Message from syslogd@s2 at Mar 16 09:39:50 ...
 kernel:Call Trace:
Mar 16 09:39:50 s2 kernel: [<ffffffff811892f2>] vfs_ioctl+0x22/0xa0
Mar 16 09:39:50 s2 kernel: [<ffffffff814f253e>] ? do_page_fault+0x3e/0xa0
Mar 16 09:39:50 s2 kernel: [<ffffffff81189494>] do_vfs_ioctl+0x84/0x580
Mar 16 09:39:50 s2 kernel: [<ffffffff81189a11>] sys_ioctl+0x81/0xa0
Mar 16 09:39:50 s2 kernel: [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b Mar 16 09:39:50 s2 kernel: Code: 83 ec 20 48 89 1c 24 4c 89 64 24 08 4c 89 6c 24 10 4c 89 74 24 18 0f 1f 44 00 00 4c 8b b7 a0 00 00 00 89 f3 49 89 d4 49 8b 46 38 <4c> 8b 68 38 8d 83 10 76 ff ff 83 f8 0f 76 51 8d 83 00 75 ff ff
Message from syslogd@s2 at Mar 16 09:39:50 ...
kernel:Code: 83 ec 20 48 89 1c 24 4c 89 64 24 08 4c 89 6c 24 10 4c 89 74 24 18 0f 1f 44 00 00 4c 8b b7 a0 00 00 00 89 f3 49 89 d4 49 8b 46 38 <4c> 8b 68 38 8d 83 10 76 ff ff 83 f8 0f 76 51 8d 83 00 75 ff ff
Mar 16 09:39:50 s2 kernel: RIP  [<ffffffff814185a0>] sock_ioctl+0x30/0x280
Mar 16 09:39:50 s2 kernel: RSP <ffff880229bf7e38>
Mar 16 09:39:50 s2 kernel: CR2: 0000000000000038
Message from syslogd@s2 at Mar 16 09:39:50 ...
 kernel:CR2: 0000000000000038
Mar 16 09:39:50 s2 kernel: ---[ end trace bf74669367969d52 ]---
Mar 16 09:39:50 s2 kernel: Kernel panic - not syncing: Fatal exception
Message from syslogd@s2 at Mar 16 09:39:50 ...
 kernel:Kernel panic - not syncing: Fatal exception
Mar 16 09:39:50 s2 kernel: Pid: 4875, comm: drbdadm Tainted: G D W ---------------- 2.6.32-220.7.1.el6.x86_64 #1

_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to