Hello,

I'm happy to inform the lists about last findings in making linstor-controller HA on PVE, for reference:

https://docs.linbit.com/docs/users-guide-9.0/#s-proxmox-ls-HA

Il 27/08/2018 11:11, Roberto Resoli ha scritto:
...

I have still to investigate the condition under that drbd storage became unavailable to pve, causing all vms to stop. Hopefully I will have a chance to give you some more details after examining the logs.

I found after several trials that quorum on my cluster was too unstable to support HA.

First of all, following Yannis steps, I migrated the controller vm resource out of linstor managed ones, (so to avoid its definition being deleted at linstor-satellite startup, see https://lists.gt.net/drbd/users/30049#30049 ).

This fixed the resource unavailable issue, but after having put controller vm under HA, the nodes randomly started to reboot, often after having rebooted a selected one.

After having searched in the proxmox forum, I found that this behaviour is often related to a bad multicast setup. In particular my suspects went to the switch, after having read this sentence on the proxmox wiki:

"This uncovers problems where IGMP snooping is activated on the network but no multicast querier is active"
https://pve.proxmox.com/wiki/Cluster_Manager#_cluster_network

This was exactly my case; my switch had IGMP snooping enabled and no querier in the net. After having disabled IGMP snooping (my net is so small that doesn't make much sense configuring a querier, which should be the correct action) the quorum configuration became much more solid.

I suggest to all Proxmox cluster users to read carefully all the documentation regarding multicast configuration and testing:

https://pve.proxmox.com/wiki/Cluster_Manager
https://pve.proxmox.com/wiki/Multicast_notes

At the moment I can report only a bunch of these messages in syslog:

Aug 25 22:49:04 pve3 pvestatd[2598]: malformed JSON string, neither tag, array, object, number, string or atom, at character offset 0 (before "(end of string)") at /usr/share/perl5/PVE/Storage/Custom/LINSTORPlugin.pm line 321.

These are generated when proxmox queries the linstor plugin about current status, expecting a response in json format, but the configured controller is not responding.

Finally: at the moment the linstor controller HA is working quite well, in particular I find handy the ability to live migrate it elsewere when a node needs maintenance.

bye,
rob




_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to