Lars: That broken pipe messages from my experience always show up on the master, when the slave produces the digest mismatch messages. from my point of view this is a bit misleading. Here an example: Master node: Mar 1 08:43:37 node1 kernel: block drbd4: meta connection shut down by peer. Mar 1 08:43:37 node1 kernel: block drbd4: sock was shut down by peer Mar 1 08:43:37 node1 kernel: block drbd4: peer( Secondary -> Unknown ) conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown ) Mar 1 08:43:37 node1 kernel: block drbd4: short read expecting header on sock: r=0 Mar 1 08:43:37 node1 kernel: block drbd4: asender terminated Mar 1 08:43:37 node1 kernel: block drbd4: Creating new current UUID Mar 1 08:43:37 node1 kernel: block drbd4: Terminating asender thread Mar 1 08:43:37 node1 kernel: block drbd4: sock_sendmsg returned -32 Mar 1 08:43:37 node1 kernel: block drbd4: short sent ReportUUIDs size=56 sent=0 Mar 1 08:43:37 node1 kernel: block drbd4: Connection closed Mar 1 08:43:37 node1 kernel: block drbd4: helper command: /sbin/drbdadm fence-peer minor-4 Mar 1 08:43:38 node1 cibadmin: [17587]: info: Invoked: cibadmin -C -o constraints -X <rsc_location rsc="ms_drbd" id="drbd-fence-by-handler-ms_drbd"> <rule role="Master" score="-INFINITY" id="drbd-fence-by-handler-rule-ms_drbd"> <expression attribute="#uname" operation="ne" value="node1" id="drbd-fence-by-handler-expr-ms_drbd"/> </rule> </rsc_location> Mar 1 08:43:38 node1 kernel: block drbd4: helper command: /sbin/drbdadm fence-peer minor-4 exit code 4 (0x400) Mar 1 08:43:38 node1 kernel: block drbd4: fence-peer helper returned 4 (peer was fenced) Mar 1 08:43:38 node1 kernel: block drbd4: pdsk( DUnknown -> Outdated ) Mar 1 08:43:38 node1 kernel: block drbd4: conn( BrokenPipe -> Unconnected ) Mar 1 08:43:38 node1 kernel: block drbd4: receiver terminated Mar 1 08:43:38 node1 kernel: block drbd4: Restarting receiver thread Mar 1 08:43:38 node1 kernel: block drbd4: receiver (re)started Mar 1 08:43:38 node1 kernel: block drbd4: conn( Unconnected -> WFConnection ) Mar 1 08:43:38 node1 kernel: block drbd4: Handshake successful: Agreed network protocol version 91 Mar 1 08:43:38 node1 kernel: block drbd4: conn( WFConnection -> WFReportParams ) Mar 1 08:43:38 node1 kernel: block drbd4: Starting asender thread (from drbd4_receiver [10020]) Mar 1 08:43:38 node1 kernel: block drbd4: data-integrity-alg: md5 Mar 1 08:43:38 node1 kernel: block drbd4: drbd_sync_handshake: Mar 1 08:43:38 node1 kernel: block drbd4: self 0CC3D5A17A19D531:4064274FA17AB19D:00FFCB9ECAD3A6D4:50A168C54CD13071 bits:122 flags:0 Mar 1 08:43:38 node1 kernel: block drbd4: peer 4064274FA17AB19C:0000000000000000:00FFCB9ECAD3A6D4:50A168C54CD13071 bits:0 flags:0 Mar 1 08:43:38 node1 kernel: block drbd4: uuid_compare()=1 by rule 70 Mar 1 08:43:38 node1 kernel: block drbd4: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( Outdated -> UpToDate ) Mar 1 08:43:38 node1 kernel: block drbd4: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate -> Inconsistent ) Mar 1 08:43:38 node1 kernel: block drbd4: Began resync as SyncSource (will sync 488 KB [122 bits set]). Mar 1 08:43:38 node1 kernel: block drbd4: Resync done (total 1 sec; paused 0 sec; 488 K/sec) Mar 1 08:43:38 node1 kernel: block drbd4: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate )
Slave Node: Mar 1 08:43:37 node2 kernel: block drbd4: Digest integrity check FAILED. Mar 1 08:43:37 node2 kernel: block drbd4: error receiving Data, l: 4136! Mar 1 08:43:37 node2 kernel: block drbd4: peer( Primary -> Unknown ) conn( Connected -> ProtocolError ) pdsk( UpToDate -> DUnknown ) Mar 1 08:43:37 node2 kernel: block drbd4: asender terminated Mar 1 08:43:37 node2 kernel: block drbd4: Terminating asender thread Mar 1 08:43:37 node2 kernel: block drbd4: Connection closed Mar 1 08:43:37 node2 kernel: block drbd4: conn( ProtocolError -> Unconnected ) Mar 1 08:43:37 node2 kernel: block drbd4: receiver terminated Mar 1 08:43:37 node2 kernel: block drbd4: Restarting receiver thread Mar 1 08:43:37 node2 kernel: block drbd4: receiver (re)started Mar 1 08:43:37 node2 kernel: block drbd4: conn( Unconnected -> WFConnection ) Mar 1 08:43:38 node2 kernel: block drbd4: Handshake successful: Agreed network protocol version 91 Mar 1 08:43:38 node2 kernel: block drbd4: conn( WFConnection -> WFReportParams ) Mar 1 08:43:38 node2 kernel: block drbd4: Starting asender thread (from drbd4_receiver [433]) Mar 1 08:43:38 node2 kernel: block drbd4: data-integrity-alg: md5 Mar 1 08:43:38 node2 kernel: block drbd4: drbd_sync_handshake: Mar 1 08:43:38 node2 kernel: block drbd4: self 4064274FA17AB19C:0000000000000000:00FFCB9ECAD3A6D4:50A168C54CD13071 bits:0 flags:0 Mar 1 08:43:38 node2 kernel: block drbd4: peer 0CC3D5A17A19D531:4064274FA17AB19D:00FFCB9ECAD3A6D4:50A168C54CD13071 bits:122 flags:0 Mar 1 08:43:38 node2 kernel: block drbd4: uuid_compare()=-1 by rule 50 Mar 1 08:43:38 node2 kernel: block drbd4: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Mar 1 08:43:38 node2 kernel: block drbd4: conn( WFBitMapT -> WFSyncUUID ) Mar 1 08:43:38 node2 kernel: block drbd4: helper command: /sbin/drbdadm before-resync-target minor-4 Mar 1 08:43:38 node2 kernel: block drbd4: helper command: /sbin/drbdadm before-resync-target minor-4 exit code 0 (0x0) Mar 1 08:43:38 node2 kernel: block drbd4: conn( WFSyncUUID -> SyncTarget ) disk( UpToDate -> Inconsistent ) Mar 1 08:43:38 node2 kernel: block drbd4: Began resync as SyncTarget (will sync 488 KB [122 bits set]). Mar 1 08:43:38 node2 kernel: block drbd4: Resync done (total 1 sec; paused 0 sec; 488 K/sec) Mar 1 08:43:38 node2 kernel: block drbd4: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) Mar 1 08:43:38 node2 kernel: block drbd4: helper command: /sbin/drbdadm after-resync-target minor-4 Mar 1 08:43:39 node2 kernel: block drbd4: helper command: /sbin/drbdadm after-resync-target minor-4 exit code 0 (0x0) That system is running DRBD 8.3.4 8.3.7 and 8.3.8.1 show the same output and behaviour on the systems we run them on. OS is Sles11 / Sles11SP1 Mit freundlichen Grüßen / Best Regards Robert Köppl Systemadministration KNAPP Systemintegration GmbH Waltenbachstraße 9 8700 Leoben, Austria Phone: +43 3842 805-910 Fax: +43 3842 82930-500 [email protected] www.KNAPP.com Commercial register number: FN 138870x Commercial register court: Leoben The information in this e-mail (including any attachment) is confidential and intended to be for the use of the addressee(s) only. If you have received the e-mail by mistake, any disclosure, copy, distribution or use of the contents of the e-mail is prohibited, and you must delete the e-mail from your system. As e-mail can be changed electronically KNAPP assumes no responsibility for any alteration to this e-mail or its attachments. KNAPP has taken every reasonable precaution to ensure that any attachment to this e-mail has been swept for virus. However, KNAPP does not accept any liability for damage sustained as a result of such attachment being virus infected and strongly recommend that you carry out your own virus check before opening any attachment. Lars Ellenberg <[email protected]> Gesendet von: [email protected] 28.02.2011 18:08 Bitte antworten an General Linux-HA mailing list <[email protected]> An [email protected] Kopie Thema Re: [Linux-HA] Antwort: Re: DRBD BrokenPipe On Mon, Feb 28, 2011 at 01:39:58PM +0100, [email protected] wrote: > Lars Ellenberg gave some interesting information about this messages - at > least if you have verification of your Network traffic enabled: Well, that's not much to do with "BrokenPipe", but with "Digest mismatch", "Digest integrity check FAILED" etc. And there has been no mention of that in the (much too short) log excerpt shown. So this may be something completely different. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
