Hi all,
We are running a web cluster based on dual primary drbd configuration
and ocfs2. During each week-end we run a online verify on the drbd
volume by executing "/sbin/drbdadm verify all" on one node. Last w-e,
one node (not the one executing the verify command) completely crash and
we found it this morning with a nice kernel panic message on the console.
Anybody else already observed this behavior?
OS: Linux server1.ucl.ac.be 2.6.18-194.3.1.el5 #1 SMP Thu May 13
13:08:30 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
DRBD: # modinfo drbd
filename: /lib/modules/2.6.18-194.3.1.el5/weak-updates/drbd83/drbd.ko
alias: block-major-147-*
license: GPL
version: 8.3.2
description: drbd - Distributed Replicated Block Device v8.3.2
author: Philipp Reisner <[email protected]>, Lars Ellenberg
<[email protected]>
srcversion: EB9EAE1FF5D024E96B05208
depends:
vermagic: 2.6.18-128.7.1.el5 SMP mod_unload gcc-4.1
parm: minor_count:Maximum number of drbd devices (1-255) (uint)
parm: disable_sendpage:bool
parm: allow_oos:DONT USE! (bool)
parm: cn_idx:uint
parm: proc_details:int
parm: enable_faults:int
parm: fault_rate:int
parm: fault_count:int
parm: fault_devs:int
parm: usermode_helper:string
Log on server1:
Oct 10 00:42:01 server1 kernel: block drbd0: conn( Connected -> VerifyS )
Oct 10 00:42:01 server1 kernel: block drbd0: Starting Online Verify from
sector 0
Oct 10 00:42:11 server1 kernel: block drbd0: PingAck did not arrive in time.
Oct 10 00:42:11 server1 kernel: block drbd0: peer( Primary -> Unknown )
conn( VerifyS -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Oct 10 00:42:11 server1 kernel: block drbd0: Online Verify reached sector 0
Oct 10 00:42:11 server1 kernel: block drbd0: asender terminated
Oct 10 00:42:11 server1 kernel: block drbd0: Terminating asender thread
Oct 10 00:42:11 server1 kernel: block drbd0: short read expecting header
on sock: r=-512
Oct 10 00:42:11 server1 kernel: block drbd0: Creating new current UUID
Oct 10 00:42:11 server1 kernel: block drbd0: Connection closed
Oct 10 00:42:11 server1 kernel: block drbd0: conn( NetworkFailure ->
Unconnected )
Oct 10 00:42:11 server1 kernel: block drbd0: receiver terminated
Oct 10 00:42:11 server1 kernel: block drbd0: Restarting receiver thread
Oct 10 00:42:11 server1 kernel: block drbd0: receiver (re)started
Oct 10 00:42:11 server1 kernel: block drbd0: conn( Unconnected ->
WFConnection )
Log on server2:
Oct 10 00:42:01 server2 kernel: block drbd0: conn( Connected -> VerifyT )
Oct 10 00:42:01 server2 kernel: block drbd0: Online Verify start sector: 0
--
--------------------------------------------------------------------
Fabrice Charlier - UCL/SGSI/SIPR
Office : +32.10.47.32.34
GSM : +32.474.86.81.23
-------------------------------------------------------------------
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user