Serial console?
Netconsole?
Logs?

Which logs are you interested about, it is the first time I'm seriously 
troubleshooting DRBD problem.
The /var/log/messages. just stops having messages on the time of the freeze 
(see snippet below). is there some debug level I can increase for DRBD?


Network stress tests not using DRBD?
General stress tests?
Memtest?

The problem happens on the "production lan" as well on a 4 port "1Gig staging 
switch". iperf shows in all cases normal values.
The problems happens on Fujitsu Siemens server RX200/RX300. The total of Fujistu Siemens 
Servers having this problem is 6 in total. Other servers I have installed do not have 
this problem. The Fujistu Siemens server have onboard Broadcom interfaces "NIC: 
NetXtreme II BCM5708 Gigabit Ethernet".


---------- /var/log/messages on the target machine --------------
Sep 25 11:33:13 Cluster3Node1 kernel: block drbd2: PingAck did not arrive in time. Sep 25 11:33:13 Cluster3Node1 kernel: block drbd2: peer( Secondary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Sep 25 11:33:13 Cluster3Node1 kernel: block drbd2: asender terminated
Sep 25 11:33:13 Cluster3Node1 kernel: block drbd2: Terminating asender thread Sep 25 11:33:13 Cluster3Node1 kernel: block drbd2: short read expecting header on sock: r=-512
Sep 25 11:33:13 Cluster3Node1 kernel: block drbd2: Connection closed
Sep 25 11:33:13 Cluster3Node1 kernel: block drbd2: conn( NetworkFailure -> Unconnected )
Sep 25 11:33:13 Cluster3Node1 kernel: block drbd2: receiver terminated
Sep 25 11:33:13 Cluster3Node1 kernel: block drbd2: Restarting receiver thread
Sep 25 11:33:13 Cluster3Node1 kernel: block drbd2: receiver (re)started
Sep 25 11:33:13 Cluster3Node1 kernel: block drbd2: conn( Unconnected -> WFConnection ) Sep 25 11:33:19 Cluster3Node1 kernel: block drbd0: PingAck did not arrive in time. Sep 25 11:33:19 Cluster3Node1 kernel: block drbd0: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Sep 25 11:33:19 Cluster3Node1 kernel: block drbd0: asender terminated
Sep 25 11:33:19 Cluster3Node1 kernel: block drbd0: Terminating asender thread Sep 25 11:33:19 Cluster3Node1 kernel: block drbd0: short read expecting header on sock: r=-512
Sep 25 11:33:19 Cluster3Node1 kernel: block drbd0: Connection closed
Sep 25 11:33:19 Cluster3Node1 kernel: block drbd0: conn( NetworkFailure -> Unconnected )
Sep 25 11:33:19 Cluster3Node1 kernel: block drbd0: receiver terminated
Sep 25 11:33:19 Cluster3Node1 kernel: block drbd0: Restarting receiver thread
Sep 25 11:33:19 Cluster3Node1 kernel: block drbd0: receiver (re)started
Sep 25 11:33:19 Cluster3Node1 kernel: block drbd0: conn( Unconnected -> WFConnection )
---------- here it is frozen -------------------------------
---------- /var/log/messages on the target machine --------------
Here it stop until the booting messages of the reboot show up.

mfg,

jeroen.

Lars Ellenberg wrote:
On Fri, Sep 25, 2009 at 01:10:24PM +0200, Jeroen Groenewegen van der Weyden 
wrote:
Anybody?

The same seems to happen with 8.3.3RC2. although the error is either to freeze the system or the system disconnects all network interfaces. Anybody?

mfg,

jeroen

Jeroen Groenewegen van der Weyden wrote:
Hello,

I have a problem when full syncing with drbd the target machine freezes. scenario is simple whenever a full sync is made manual or automaticly the syncing is stalled after some time. after the syncing reaches the stalled states a view moments later the target machine freeze entirely.

OpenSuse 11.1
kernel 2.6.27.21-0.1-xen #
drbd 8.3.1

NIC: NetXtreme II BCM5708 Gigabit Ethernet

On the Source Machine:
cat /proc/drbd
version: 8.3.1 (api:88/proto:86-89)
GIT-hash: fd40f4a8f9104941537d1afc8521e584a6d3003c build by r...@defaultnode, 2009-04-27 11:34:17
0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r----
ns:324524 nr:0 dw:110988 dr:689400 al:263 bm:242 lo:0 pe:2131 ua:978 ap:36 ep:1 wo:b oos:1635880
       [==>.................] sync'ed: 16.4% (1635880/1951768)K
       stalled

How to find out what is happening here?

Serial console?
Netconsole?
Logs?

Network stress tests not using DRBD?
General stress tests?
Memtest?

(and prevent it in the future.)

------------------------------------------------------------------------


No virus found in this incoming message.
Checked by AVG - www.avg.com Version: 8.5.409 / Virus Database: 270.13.112/2393 - Release Date: 09/24/09 18:00:00


_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to