Hello, On Mon, 10 Jan 2011, Holger Kiehl wrote:
Hello, upgrading kernel on secondary from 2.6.36.2 to 2.6.37 gives me the following error on primary: Jan 10 12:41:57 obelix kernel: block drbd0: BAD! BarrierAck #2350363662 received, expected #2350363661! Jan 10 12:41:57 obelix kernel: block drbd0: peer( Secondary -> Unknown ) conn( Connected -> ProtocolError ) pdsk( UpToDate -> DUnknown ) Jan 10 12:41:57 obelix kernel: block drbd0: short read expecting header on sock: r=-512 Jan 10 12:41:57 obelix kernel: block drbd0: Creating new current UUID Jan 10 12:41:57 obelix kernel: block drbd0: asender terminated Jan 10 12:41:57 obelix kernel: block drbd0: Terminating drbd0_asender Jan 10 12:41:57 obelix kernel: block drbd0: Connection closed Jan 10 12:41:57 obelix kernel: block drbd0: conn( ProtocolError -> Unconnected ) Jan 10 12:41:57 obelix kernel: block drbd0: receiver terminated Jan 10 12:41:57 obelix kernel: block drbd0: Restarting drbd0_receiver Jan 10 12:41:57 obelix kernel: block drbd0: receiver (re)started Jan 10 12:41:57 obelix kernel: block drbd0: conn( Unconnected -> WFConnection ) Jan 10 12:41:57 obelix kernel: block drbd0: Handshake successful: Agreed network protocol version 95 Jan 10 12:41:57 obelix kernel: block drbd0: conn( WFConnection -> WFReportParams ) Jan 10 12:41:57 obelix kernel: block drbd0: Starting asender thread (from drbd0_receiver [3233]) Jan 10 12:41:57 obelix kernel: block drbd0: data-integrity-alg: <not-used> Jan 10 12:41:57 obelix kernel: block drbd0: max_segment_size ( = BIO size ) = 65536 Jan 10 12:41:57 obelix kernel: block drbd0: drbd_sync_handshake: Jan 10 12:41:57 obelix kernel: block drbd0: self 28DDE63A9DEC9869:19CC15BDDB81CF01:8C9904DC3E8DFFD7:F46F8C2F00547891 bits:500 flags:0 Jan 10 12:41:57 obelix kernel: block drbd0: peer 19CC15BDDB81CF00:0000000000000000:8C9904DC3E8DFFD6:F46F8C2F00547891 bits:0 flags:0 Jan 10 12:41:57 obelix kernel: block drbd0: uuid_compare()=1 by rule 70 Jan 10 12:41:57 obelix kernel: block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate ) Upgrading the primary to 2.6.37 did also not help, it produces the same errors. I tried this on two different clusters and always the above error pops up if secondary is 2.6.37.
The same problem still exists when using kernel 2.6.38.1: Mar 25 08:54:20 obelix kernel: block drbd0: BAD! BarrierAck #1861867747 received, expected #1861867746! Mar 25 08:54:20 obelix kernel: block drbd0: peer( Secondary -> Unknown ) conn( Connected -> ProtocolError ) pdsk( UpToDate -> DUnknown ) Mar 25 08:54:20 obelix kernel: block drbd0: process_done_ee() = NOT_OK Mar 25 08:54:20 obelix kernel: block drbd0: asender terminated Mar 25 08:54:20 obelix kernel: block drbd0: Terminating drbd0_asender Mar 25 08:54:20 obelix kernel: block drbd0: short read expecting header on sock: r=-512 Mar 25 08:54:20 obelix kernel: block drbd0: Creating new current UUID Mar 25 08:54:20 obelix kernel: block drbd0: Connection closed Mar 25 08:54:20 obelix kernel: block drbd0: conn( ProtocolError -> Unconnected ) Mar 25 08:54:20 obelix kernel: block drbd0: receiver terminated Mar 25 08:54:20 obelix kernel: block drbd0: Restarting drbd0_receiver Mar 25 08:54:20 obelix kernel: block drbd0: receiver (re)started Mar 25 08:54:20 obelix kernel: block drbd0: conn( Unconnected -> WFConnection ) Mar 25 08:54:20 obelix kernel: block drbd0: Handshake successful: Agreed network protocol version 94 Mar 25 08:54:20 obelix kernel: block drbd0: conn( WFConnection -> WFReportParams ) Mar 25 08:54:20 obelix kernel: block drbd0: Starting asender thread (from drbd0_receiver [3220]) Mar 25 08:54:20 obelix kernel: block drbd0: data-integrity-alg: <not-used> Mar 25 08:54:20 obelix kernel: block drbd0: drbd_sync_handshake: Mar 25 08:54:20 obelix kernel: block drbd0: self 840572B18801AA3B:F99A9CC7F9DDDB47:916E679DA4726603:830351EC828F2F13 bits:191 flags:0 Mar 25 08:54:20 obelix kernel: block drbd0: peer F99A9CC7F9DDDB46:0000000000000000:916E679DA4726602:830351EC828F2F13 bits:0 flags:0 Mar 25 08:54:20 obelix kernel: block drbd0: uuid_compare()=1 by rule 70 Mar 25 08:54:20 obelix kernel: block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate ) Mar 25 08:54:20 obelix kernel: block drbd0: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate -> Inconsistent ) Mar 25 08:54:20 obelix kernel: block drbd0: Began resync as SyncSource (will sync 764 KB [191 bits set]). Mar 25 08:54:21 obelix kernel: block drbd0: Resync done (total 1 sec; paused 0 sec; 764 K/sec) Mar 25 08:54:21 obelix kernel: block drbd0: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) And this then continues frequently: Mar 25 08:53:00 obelix kernel: block drbd0: BAD! BarrierAck #1224296926 received, expected #1224296925! Mar 25 08:54:20 obelix kernel: block drbd0: BAD! BarrierAck #1861867747 received, expected #1861867746! Mar 25 08:54:35 obelix kernel: block drbd0: BAD! BarrierAck #4040326970 received, expected #4040326969! Mar 25 08:56:21 obelix kernel: block drbd0: BAD! BarrierAck #1235958129 received, expected #1235958128! Mar 25 08:57:31 obelix kernel: block drbd0: BAD! BarrierAck #4096191267 received, expected #4096191266! Mar 25 08:58:51 obelix kernel: block drbd0: BAD! BarrierAck #1578973016 received, expected #1578973015! Mar 25 08:59:26 obelix kernel: block drbd0: BAD! BarrierAck #4131468500 received, expected #4131468499! Mar 25 09:00:08 obelix kernel: block drbd0: BAD! BarrierAck #4013314144 received, expected #4013314143! Mar 25 09:01:19 obelix kernel: block drbd0: BAD! BarrierAck #2538005992 received, expected #2538005991! Kernel 2.6.36.x is working without this problem. Any idea what is causing this? What other information is required to solve this issue? Regards, Holger _______________________________________________ drbd-user mailing list [email protected] http://lists.linbit.com/mailman/listinfo/drbd-user
