No ko-count set, so apparently something different...
Am 27.05.21 um 13:37 schrieb Rene Peinthor: > Could still be related to this fix: > > * fix timeout detection after idle periods and for configs with ko-count > when a disk on an a secondary stops delivering IO-completion events > > So if you have a ko-count set, this should be fixed. > Or it is something completely different... ;) > > Cheers, > Rene > > On Thu, May 27, 2021 at 1:25 PM Andreas Pflug <[email protected] > <mailto:[email protected]>> wrote: > > I'm running a Proxmox cluster with 3 disk nodes and 3 diskless nodes > with drbd 9.1.1. The disk nodes have storage on md raid6 (8 ssds each) > with a journal on an optane device. > > Yesterday, the whole cluster was severely impacted when one node had > write problems. There is no indication for any hardware problem, no > events whatsoever. What happened, taken from the logs: > > - one diskless node reports "sending time expired" for some devices on a > specific disk node. After 30 seconds, it disconnects those devices on > that node. > - the disk node logs state change to outdated. > - After 80s, the disk node logs "task blocked for more than 120 > seconds". These tasks are 8 drbd_r_xxx processes, but also md2_reclaim. > - No more logging after that. > > After that, the whole cluster was severely impacted, most vms > unresponsive. The node hosts were still accessible, with no more kernel > logging. > > After analyzing the situation, assuming a single node would block > everything, that node was rebooted (no normal reboot possible, needed > "echo b >/proc/sysrq-trigger"). This did help, everything back to > normal. > > So apparently there are situations when a backing storage problem might > block all drbd processing in a way that prevents normal timeout > detection and subsequent disconnection on other nodes. Reading the 9.1.2 > release notes, this doesn't seem to be addressed there. > > Regards, > Andreas > > _______________________________________________ > Star us on GITHUB: https://github.com/LINBIT <https://github.com/LINBIT> > drbd-user mailing list > [email protected] <mailto:[email protected]> > https://lists.linbit.com/mailman/listinfo/drbd-user > <https://lists.linbit.com/mailman/listinfo/drbd-user> > _______________________________________________ Star us on GITHUB: https://github.com/LINBIT drbd-user mailing list [email protected] https://lists.linbit.com/mailman/listinfo/drbd-user
