Could still be related to this fix: * fix timeout detection after idle periods and for configs with ko-count when a disk on an a secondary stops delivering IO-completion events
So if you have a ko-count set, this should be fixed. Or it is something completely different... ;) Cheers, Rene On Thu, May 27, 2021 at 1:25 PM Andreas Pflug <[email protected]> wrote: > I'm running a Proxmox cluster with 3 disk nodes and 3 diskless nodes > with drbd 9.1.1. The disk nodes have storage on md raid6 (8 ssds each) > with a journal on an optane device. > > Yesterday, the whole cluster was severely impacted when one node had > write problems. There is no indication for any hardware problem, no > events whatsoever. What happened, taken from the logs: > > - one diskless node reports "sending time expired" for some devices on a > specific disk node. After 30 seconds, it disconnects those devices on > that node. > - the disk node logs state change to outdated. > - After 80s, the disk node logs "task blocked for more than 120 > seconds". These tasks are 8 drbd_r_xxx processes, but also md2_reclaim. > - No more logging after that. > > After that, the whole cluster was severely impacted, most vms > unresponsive. The node hosts were still accessible, with no more kernel > logging. > > After analyzing the situation, assuming a single node would block > everything, that node was rebooted (no normal reboot possible, needed > "echo b >/proc/sysrq-trigger"). This did help, everything back to normal. > > So apparently there are situations when a backing storage problem might > block all drbd processing in a way that prevents normal timeout > detection and subsequent disconnection on other nodes. Reading the 9.1.2 > release notes, this doesn't seem to be addressed there. > > Regards, > Andreas > > _______________________________________________ > Star us on GITHUB: https://github.com/LINBIT > drbd-user mailing list > [email protected] > https://lists.linbit.com/mailman/listinfo/drbd-user >
_______________________________________________ Star us on GITHUB: https://github.com/LINBIT drbd-user mailing list [email protected] https://lists.linbit.com/mailman/listinfo/drbd-user
