Serial console?
Netconsole?
Logs?
Which logs are you interested about, it is the first time I'm seriously
troubleshooting DRBD problem.
The /var/log/messages. just stops having messages on the time of the freeze
(see snippet below). is there some debug level I can increase for DRBD?
Network stress tests not using DRBD?
General stress tests?
Memtest?
The problem happens on the "production lan" as well on a 4 port "1Gig staging
switch". iperf shows in all cases normal values.
The problems happens on Fujitsu Siemens server RX200/RX300. The total of Fujistu Siemens
Servers having this problem is 6 in total. Other servers I have installed do not have
this problem. The Fujistu Siemens server have onboard Broadcom interfaces "NIC:
NetXtreme II BCM5708 Gigabit Ethernet".
---------- /var/log/messages on the target machine --------------
Sep 25 11:33:13 Cluster3Node1 kernel: block drbd2: PingAck did not
arrive in time.
Sep 25 11:33:13 Cluster3Node1 kernel: block drbd2: peer( Secondary ->
Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Sep 25 11:33:13 Cluster3Node1 kernel: block drbd2: asender terminated
Sep 25 11:33:13 Cluster3Node1 kernel: block drbd2: Terminating asender
thread
Sep 25 11:33:13 Cluster3Node1 kernel: block drbd2: short read expecting
header on sock: r=-512
Sep 25 11:33:13 Cluster3Node1 kernel: block drbd2: Connection closed
Sep 25 11:33:13 Cluster3Node1 kernel: block drbd2: conn( NetworkFailure
-> Unconnected )
Sep 25 11:33:13 Cluster3Node1 kernel: block drbd2: receiver terminated
Sep 25 11:33:13 Cluster3Node1 kernel: block drbd2: Restarting receiver
thread
Sep 25 11:33:13 Cluster3Node1 kernel: block drbd2: receiver (re)started
Sep 25 11:33:13 Cluster3Node1 kernel: block drbd2: conn( Unconnected ->
WFConnection )
Sep 25 11:33:19 Cluster3Node1 kernel: block drbd0: PingAck did not
arrive in time.
Sep 25 11:33:19 Cluster3Node1 kernel: block drbd0: peer( Primary ->
Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Sep 25 11:33:19 Cluster3Node1 kernel: block drbd0: asender terminated
Sep 25 11:33:19 Cluster3Node1 kernel: block drbd0: Terminating asender
thread
Sep 25 11:33:19 Cluster3Node1 kernel: block drbd0: short read expecting
header on sock: r=-512
Sep 25 11:33:19 Cluster3Node1 kernel: block drbd0: Connection closed
Sep 25 11:33:19 Cluster3Node1 kernel: block drbd0: conn( NetworkFailure
-> Unconnected )
Sep 25 11:33:19 Cluster3Node1 kernel: block drbd0: receiver terminated
Sep 25 11:33:19 Cluster3Node1 kernel: block drbd0: Restarting receiver
thread
Sep 25 11:33:19 Cluster3Node1 kernel: block drbd0: receiver (re)started
Sep 25 11:33:19 Cluster3Node1 kernel: block drbd0: conn( Unconnected ->
WFConnection )
---------- here it is frozen -------------------------------
---------- /var/log/messages on the target machine --------------
Here it stop until the booting messages of the reboot show up.
mfg,
jeroen.
Lars Ellenberg wrote:
On Fri, Sep 25, 2009 at 01:10:24PM +0200, Jeroen Groenewegen van der Weyden
wrote:
Anybody?
The same seems to happen with 8.3.3RC2. although the error is either to
freeze the system or the system disconnects all network interfaces.
Anybody?
mfg,
jeroen
Jeroen Groenewegen van der Weyden wrote:
Hello,
I have a problem when full syncing with drbd the target machine
freezes. scenario is simple whenever a full sync is made manual or
automaticly the syncing is stalled after some time. after the syncing
reaches the stalled states a view moments later the target machine
freeze entirely.
OpenSuse 11.1
kernel 2.6.27.21-0.1-xen #
drbd 8.3.1
NIC: NetXtreme II BCM5708 Gigabit Ethernet
On the Source Machine:
cat /proc/drbd
version: 8.3.1 (api:88/proto:86-89)
GIT-hash: fd40f4a8f9104941537d1afc8521e584a6d3003c build by
r...@defaultnode, 2009-04-27 11:34:17
0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r----
ns:324524 nr:0 dw:110988 dr:689400 al:263 bm:242 lo:0 pe:2131
ua:978 ap:36 ep:1 wo:b oos:1635880
[==>.................] sync'ed: 16.4% (1635880/1951768)K
stalled
How to find out what is happening here?
Serial console?
Netconsole?
Logs?
Network stress tests not using DRBD?
General stress tests?
Memtest?
(and prevent it in the future.)
------------------------------------------------------------------------
No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.409 / Virus Database: 270.13.112/2393 - Release Date: 09/24/09 18:00:00
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user