On Mon, Apr 16, 2018 at 12:44:22PM -0400, dehacked wrote: > Greetings, > > I have a small cluster used for Openstack (Newton on centos 7 nodes). I have > 2 main storage nodes, 1 openstack controller node and 5 'diskless' > hypervisors. It's configured with the hypervisors as satellite nodes and the > 3 remaining servers as management nodes with the management volume, though > only the 2 storage nodes actually hold the rest of the user data. > > I'm finding that drbdmanage hangs frequently trying to communicate with the > service. Even 'drbdmanage ping' will timeout. Examining the service process > I see it apparently busy connecting to another host which is itself hung. > > Any ideas what's wrong or what troubleshooting steps I should be taking here?
Usually this is a sign that at least one of them is busy and tries to do the same thing (e.g., create a resource, delete a resource,...) over and over again. Usually that stops after a fail-count is reached. But if it even takes longer than the TCP timeout we set, a node might not even be able to report back that it failed doing something. And then this loops. There have been fixes in that regard and the latest version has a configurable TCP timeout. Enable debugging, check if you detect such a "busy loop" in the syslogs. > Thanks > > drbdmanage version 0.99.14 > kernel driver version 9.0.9 > drbd-utils version 9.1.1 > all built from source tarballs Every single one of them is outdated. At least try the latest drbdmange. Regards, rck _______________________________________________ drbd-user mailing list [email protected] http://lists.linbit.com/mailman/listinfo/drbd-user
