On Wed, Aug 29, 2018 at 11:33:26AM +0000, Nicolas wrote: > Hello > > Sorry for the misunderstanding of utils version. > > I'm using the kernel : 4.9.88-1+deb9u1 (4.9.0-6-amd64 debian). > And the module version v8.4.7. > srcversion: 0904DF2CCF7283ACE07D07A
Not that I think it has anything to do with this particular issue, but I'd suggest you upgrade to 8.4.11 anyways. > For example when a node says: > > [Tue Aug 28 14:32:38 2018] drbd resource10: peer( Primary -> Unknown ) conn( > Connected -> Disconnecting ) pdsk( UpToDate -> DUnknown ) > [Tue Aug 28 14:32:38 2018] drbd resource10: ack_receiver terminated > [Tue Aug 28 14:32:38 2018] drbd resource10: Terminating drbd_a_resource > [Tue Aug 28 14:32:38 2018] drbd resource10: Connection closed > [Tue Aug 28 14:32:38 2018] drbd resource10: conn( Disconnecting -> StandAlone > ) > [Tue Aug 28 14:32:38 2018] drbd resource10: receiver terminated > [Tue Aug 28 14:32:38 2018] drbd resource10: Terminating drbd_r_resource > [Tue Aug 28 14:32:38 2018] block drbd10: disk( UpToDate -> Failed ) > [Tue Aug 28 14:32:38 2018] block drbd10: 0 KB (0 bits) marked out-of-sync by > on disk bit-map. > [Tue Aug 28 14:32:38 2018] block drbd10: disk( Failed -> Diskless ) > [Tue Aug 28 14:32:38 2018] drbd resource10: Terminating drbd_w_resource > [Tue Aug 28 14:32:40 2018] drbd resource10: Starting worker thread (from > drbdsetup-84 [10222]) Okay. So this is "someone or something" doing a "drbdadm down ; drbdadm up" > The second says: > > [Tue Aug 28 14:35:33 2018] br0: port 8(tap6) entered disabled state > [Tue Aug 28 14:35:33 2018] device tap6 left promiscuous mode Uhm, time stamps do not match the excerpt above. > [Tue Aug 28 14:35:33 2018] br0: port 8(tap6) entered disabled state > [Tue Aug 28 14:35:37 2018] drbd resource10: peer( Secondary -> Unknown ) > conn( Connected -> TearDown ) pdsk( UpToDate -> DUnknown ) > [Tue Aug 28 14:35:37 2018] drbd resource10: ack_receiver terminated > [Tue Aug 28 14:35:37 2018] drbd resource10: Terminating drbd_a_resource > [Tue Aug 28 14:35:37 2018] block drbd10: new current UUID > 629F1036CD6CA2AF:0748EE11C429D3B5:FDAEFCD2E8D9890B:FDADFCD2E8D9890B > [Tue Aug 28 14:35:37 2018] drbd resource10: Connection closed > [Tue Aug 28 14:35:37 2018] drbd resource10: conn( TearDown -> Unconnected ) > [Tue Aug 28 14:35:37 2018] drbd resource10: receiver terminated > [Tue Aug 28 14:35:37 2018] drbd resource10: Restarting receiver thread > [Tue Aug 28 14:35:37 2018] drbd resource10: receiver (re)started > [Tue Aug 28 14:35:37 2018] drbd resource10: conn( Unconnected -> WFConnection > ) This is "peer node disconnected for some reason". > [Tue Aug 28 14:35:38 2018] block drbd10: role( Primary -> Secondary ) > [Tue Aug 28 14:35:38 2018] block drbd10: 0 KB (0 bits) marked out-of-sync by > on disk bit-map. > [Tue Aug 28 14:35:38 2018] drbd resource10: conn( WFConnection -> > Disconnecting ) > [Tue Aug 28 14:35:38 2018] drbd resource10: Discarding network configuration. > [Tue Aug 28 14:35:38 2018] drbd resource10: Connection closed > [Tue Aug 28 14:35:38 2018] drbd resource10: conn( Disconnecting -> StandAlone > ) > [Tue Aug 28 14:35:38 2018] drbd resource10: receiver terminated > [Tue Aug 28 14:35:38 2018] drbd resource10: Terminating drbd_r_resource > [Tue Aug 28 14:35:38 2018] block drbd10: disk( UpToDate -> Failed ) > [Tue Aug 28 14:35:38 2018] block drbd10: 0 KB (0 bits) marked out-of-sync by > on disk bit-map. > [Tue Aug 28 14:35:38 2018] block drbd10: disk( Failed -> Diskless ) > [Tue Aug 28 14:35:38 2018] drbd resource10: Terminating drbd_w_resource And again, this is a "drbdadm down ; drbdadm up" > And it seems for this example the second node was the origin of this. > This night I got another error, saying network failure, but I'm sure there > was no network issue: > > First node: > > [Wed Aug 29 01:39:48 2018] drbd resource0: meta connection shut down by peer. > [Wed Aug 29 01:39:48 2018] drbd resource0: peer( Primary -> Unknown ) conn( > Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) ... peer node shut down the connection, and as a result this node goes through a state called NetworkFailure, then all the motions, then reconnects, and syncs up. > Second node: > > [Wed Aug 29 01:42:48 2018] drbd resource0: PingAck did not arrive in time. Again, time stamps do not match up. But there is your reason for this incident: "PingAck did not arrive in time". Find out why, or simply increase the ping ack timeout. > -------- Message transféré ------- > De: "Lars Ellenberg" <[email protected] > (mailto:[email protected]?to=%22Lars%20Ellenberg%22%20<[email protected]>)> > À: [email protected] (mailto:[email protected]) > Envoyé: 29 août 2018 12:09 > Objet: Re: [DRBD-user] drbd issue? > > On Tue, Aug 28, 2018 at 02:43:47PM +0000, Nicolas wrote: Hi > > I'm using some servers on debian with ganeti and drbd. > > Since I've upgraded them to debian 9, and drbd 8.9.10-2 (from debian repo). > "drbd 8.9.10" is the *utils* version > (drbdadm, drbdsetup, drbdmeta, various scripts ...) > > drbd utils version is meanwhile at 9.5.0, btw. And no, that has not > much to do with what DRBD kernel module driver version you are using, > since we ship the "unified utils" for both "drbd 8" and "drbd 9", > which started years ago already, the utils version is decoupled from > the module versions. > > What kernel version, > and what DRBD module version? > > Maybe you want to make sure you use the latest 8.4 version (8.4.11 > currently), and not whatever "shipts with the debian kernel"? > I got a lot of issue with my drbd resources, I got randomly on my dmesg some > resources disconnected: > > today for example: > > [Tue Aug 28 14:32:38 2018] drbd resource10: peer( Primary -> Unknown ) conn( > Connected -> Disconnecting ) pdsk( UpToDate -> DUnknown ) > Well, what does the other node say? > Hit some timeouts? > Some strangeness with the new NIC drivers? > A bug in the "shipped with the debian kernel" DRBD version? -- : Lars Ellenberg : LINBIT | Keeping the Digital World Running : DRBD -- Heartbeat -- Corosync -- Pacemaker DRBD® and LINBIT® are registered trademarks of LINBIT __ please don't Cc me, but send to list -- I'm subscribed _______________________________________________ drbd-user mailing list [email protected] http://lists.linbit.com/mailman/listinfo/drbd-user
